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Foreword 


O” READERS will recall the Special Issue on Metalinguistics, ETC., IX, 161- 
240 (Spring 1952), which was entirely devoted to matters pertaining to 
those aspects of language which seem to mould the patterns of thought. The 
response to that issue was so encouraging that we immediately made plans for 
another similar attempt. The present issue is the result. 

The common theme in the material we have selected is Information and 
Communication. What makes the recent developments in this field especially 
interesting to students of general semantics is that they are centered around the 
foundations of a mathematical theory of information and communication. To 
quote D. M. MacKay, 


In everyday speech we say we have received Information, when we 
know something that we did not know before: when ‘what we know”’ has 
changed. If then we were able to measure ‘‘what we know,” we could talk 
meaningfully about the amount of information we have received, in terms 
of the measurable change it has caused. This would be invaluable in 
assessing and comparing the efficiency of methods of gaining or commu- 
nicating information. 

Information Theory is concerned with this problem of measuring 
changes in knowledge. Its key is the fact that we can represent what we 
know by means of pictures, sentences, models or the like. When we re- 
ceive information, it causes a change in the symbolic picture or representa- 
tion which we would use to depict what we know. It is found that changes 
in representations can be measured; so ‘amount of information,” actually 
in more than one sense, can be given numerical meaning.! 


A MATHEMATICIZED science is a non-aristotelian science, because it is con- 
cerned with dynamic relations among events rather than static properties 
of objects, with the structure of a phenomenon rather than its “‘nature.’’ Physics 
became non-aristotelian when Galileo substituted extensionally oriented measure- 


ment for the intensionally-oriented categorical classification of aristotelian 
philosophy. 

The greatest triumphs of science were successful mathematizations: of statics 
by Archimedes, of dynamics by Galileo and Newton, of thermodynamics by 
Helmholtz and Gibbs, of electro-magnetic phenomena by Maxwell, of genetics 
by Mendel, of evolution by Haldane, Fisher, and Wright, of atomic structure by 
Schroedinger, Jordan, and Heisenberg. 


*D. M. MacKay, “The Nomenclature of Information Theory.” Symposium on Informa- 
tion Theory. London: Ministry of Supply, 1950. 
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The importance of the new theories of information and communication is in 
that they extend the mathematical method to include events that had long been 
considered beyond the scope of the exact sciences. In fact, questions like “How 
much does he know?” or “How much has he said?” are still considered by most 
people as metaphorical uses of the concept “How much?” In the mathematical 
theory of information and communication, these questions acquire precise mean- 
ing, that is, they indicate operationally meaningful procedures aimed at obtaining 
answers. True, these procedures can be applied as yet only in very circumscribed 
situations. It must not be supposed that the communication engineers are about 
to build a device which one can attach to somebody's head in order to read off 
the extent of his knowledge or that the meaningfulness of a political speech 
can soon be indicated on a scale. The applications of the new theories are much 
more modest. They are concentrated in the technical aspects of signal trans- 
mission and deal mostly with the engineering problems of communication: 
channel capacities, coding systems, fidelity, types of modulation, etc. 

But there is more to these problems than electronics. For the communication 
engineer is now interested not only in the signal but also in something which 
is carried by the signal, namely ‘information.’ Modern communication engi- 
neering stands roughly in a similar relation to physics as psychology stands in 
relation to physiology. The physiologist may study in greatest detail the physico- 
chemical events involved in the propagation of a nerve impulse. But the psy- 
chologist is interested in more than that, namely in the information carried by 
nerve impulses which results in the integration of the over-all behavior of the 
organism, enabling it to perform complicated tasks. 

It is not surprising, therefore, that the mathematical theories of communica- 
tion and information find increasing application not only in traditional com- 
munication engineering (telegraphy, telephony, radio, television) but also in 
the design of ‘mechanical brains’’ and servo-mechanisms. Truly, then, these new 
sciences can be called the beginnings of technological psychology. 


5 pmo IS NO NEED to belabor the point that just as classical physics and chem- 
istry have provided the scientific basis for physiology, so information and 
communication theory can be expected to provide a similar basis for psychology 
and thus contribute to what Korzybski called The Science of Man. In fact, an 
important part of information theory deals with something called ‘the amount 
of organization.” Various guesses have been made about the importance of this 
concept for the understanding of the life process itself, not only in its physio- 
logical aspect but also in the social, as when a collection of individuals (an ant 
hill or a nation) is viewed as an “epi-organism’’? with its own structure, be- 
havior pattern, and life cycle. These ideas have been put forward by some of 


* A word coined by R. W. Gerard to apply to a social collection of organisms viewed 
as an organism. 
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the leading thinkers of our time, notably by E. Schroedinger, K. Deutsch, and 
N. Wiener.’ Obviously the discovery of significant invariants related to the con- 
cept of organization would be of comparable importance in biology to the dis- 
covery of the conservation of energy in physics. Here may be the richest fruits of 
mathematization. 

On the other hand, it cannot be emphasized too strongly that there is always 
a gulf between what is hoped for and what is actually achieved. Compared with 
the potentialities of information and communication theory, its achievements 
(1953) have been modest. But so were the achievements of Volta, Coulomb, 
Faraday, and Hertz compared to the potentialities of electricity. No power dams 
or radio transmitters were built by those men. But none would ever have been 
built if it were not for their work. 

Finally, it should be pointed out that there need be no conflict between an 
exact operational, mathematical approach to a set of problems and a philosophical, 
humanistic approach relying on intuitive insights. Such a conflict arises only when 
the proponents of one approach feel threatened by those of the other. To be 
sure, there has been a great deal of conflict in the areas disputed by the 
physicists and the humanists. It is commonplace for physicists, positivists, and 
behaviorists to declare that there is no place in science for anything but con- 
trolled experiment and rigorous reasoning. It is equally commonplace for many 
social scientists, psychologists, and psychiatrists to declare in self defense that 
there are areas beyond the reach of exact analytical inquiry. It is our feeling that 
the Science of Man would be enhanced if the two approaches were considered 
complementary instead of mutually exclusive. 

Thus the insights of the psychiatrist, the anthropologist, and the “practical” 
group-leader into the nature of communication may be as helpful in the long 
run in furthering our understanding as the self-contained deductions of the 
mathematician. Already the former have served to focus the attention of the 
mathematician on particularly promising problems. A case in point is the experi- 
mental work of A. Bavelas.* Bavelas was prompted by the insights of Kurt Lewin 
concerning the role of communication in group behavior. Lewin, in spite of 
mathematical-sounding formulations of his work, was no mathematician. For that 
matter, neither does Bavelas’ work rest on mathematical theory. But it is already 
in a form which can be mathematicized. This example may serve to indicate how 
the science of communication and information can be cooperatively pursued by 
both intuitive and exact methods. 


*E. Schroedinger, What Is Life? (New York: The Macmillan Company, 1945). K. 
Deutsch, “On Communication Models in the Social Sciences.” Public Opinion Quarterly, 
Vol. 16, No. 3 (Fall 1952). N. Wiener, The Human Use of Human Beines. (Boston: 
Houghton, Mifflin Co., 1950). 

* A. Bavelas, “Communication Patterns in Task-oriented Groups.” The Journal of the 
Acoustical Society of America, Vol. 22, No. 6 (November 1950). 
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a THE EMPHASIS in this issue is on the mathematical theories of 
information and communication, the reader will also find articles dealing 
with broader if less precise aspects of those theories. He will find in Warren 
Weaver's article the sort of speculation that often serves as intellectual recon- 
naissance, a forerunner of a concerted and disciplined assault on a new field; in 
Gregory Bateson’s metalogue a relaxed slipper-and-pipe probing of those exciting 
implications which make the most profound philosophical ideas sources of con- 
versational adventures with children (as Charles Dodgson, alias Lewis Carroll, 
discovered many years ago) ; in R. W. Gerard’s poem that make-believe tongue- 
in-cheek naiveté, which has become the mark of the American Intellectual. 

Incidentally, we thought it was appropriate to punctuate the completion of 
the first decade of ETC. by a poem. Our readers will recall that the first issue 
of ETC. also contained a poem by E. E. Cummings, from which this Review 
got its name. Cummings’ poem was an ode to extensional orientation. Dr. 
Gerard’s poem is a tribute to the theories of the nervous system which have 
emerged in the last decade. It was written to commemorate the Macy Con- 
ferences on Circular, Causal, and Feedback Systems. The conferences were held 
twice a year for the past five years. They were typical of the modern trend to 
break down communication barriers among the specialists of various disciplines 
and to emphasize a unified approach to the Science of Man. Many recent ideas 
on cybernetics and on the role of information theory in the study of behavior 
were given impetus by the exchange of ideas at those conferences. 


ip APPEARS to many students of general semantics that Korzybski’s program, 
ambitious beyond the capacities of any single individual, is being realized 
through a new kind of scientific activity—the interdisciplinary team. It is 
through such cooperation that the truly mew movements in modern science 
(notably cybernetics, information theory, mathematical biology, etc.) have found 
their most fruitful expression. The editors of ETC. feel that such cooperation 
should be extended so as to close the gap between the ‘‘scientist’’ and the 
“artist,” the “expert” and the “layman,” between “thinking” and ‘‘feeling.” 
Therefore we hail the attempts of a philosopher to explain information theory to 
a child and the attempt of an outstanding authority on nerve physiology to write 
a funny poem as symptoms of genuine democratization of knowledge.—A.R. 





WHAT IS INFORMATION? 


ANATOL RAPOPORT 


UPPOSE SOME ONE tosses a penny, and you try to guess ‘heads’ or “tails.” 
Every time you guess correctly you win the penny, and every time you guess 
wrong you pay your opponent a penny. You have a fifty-fifty chance to win on 
each throw. If you keep playing long enough, unless you are extremely lucky or 
unlucky, your winnings will about equal your losses." 

Now suppose some character comes along and tells you he has a crystal ball 
through which he can see how the penny falls, and that for a price, he will 
signal this information to you, so that you can win every time. You have no 
scruples about playing the game fairly (you are the ‘economic man”’ that classical 
theoreticians of economics keep talking about). What is the information offered 
worth to you? 

A common sense argument shows that if the crystal ball really works, the 
information is worth to you anywhere up to a penny a reading. If you pay a 
whole penny, you will win all of your opponent’s money and pay it all to the 
crystal ball reader. Then you can expect to be no better or no worse off than if 
you played the game trusting to your own guesses (or if you didn’t play the 
game at all: the fun of playing the game doesn’t count here, because the 
“economic man” doesn’t have any fun anyway). It follows that if you pay your 
informant anything less than a penny a guess, you are sure to be ahead in the 
long run. 

Now suppose the man with the crystal ball is a charlatan. He can’t guess 
the throws any better than you can. He knows, of course, that very soon he will 
give you wrong information and that when he does, you may balk at paying 
him for further “tip offs.” So he proposes what seems like a fair deal: you give 
him a percentage of your winnings only when you win and pay him nothing 
if his information proves false. Is it now worth while to employ him? This time 
a common sense argument says that it is worth nothing to have him around. 
If he is no better guesser than you are, you may as well make the guesses your- 
self and not pay anything.” 


* That is to say, the ratio of your winnings to your losses will be very close to one. 
In the long run, both the winnings and the losses will be so large that their difference may 
be considerable. However your winnings (or losses) per game will be very small. 


* The tendency to believe what we would like to be true often overrules common 
sense considerations. For an illuminating example, see the article on ‘Water Witching” 
by Evon Z. Vogt in The Scientific Monthly, Vol. LXXV, No. 3 (September 1952). 
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But now consider the intermediate case, where the crystal ball is good but 
not perfect. In other words, your informant can guess better than you, but he 
makes mistakes. Now is it worthwhile to pay him? Yes, it is worthwhile, and 
it is the more worthwhile the greater the difference between his guessing ability 
and yours. Certainly if you are as good as he is, there is no point in paying him. 
In other words, if your chances of guessing are as good as his, he is giving you 
no information in the long run. If he is better than you are, even if he is not 
a perfect guesser, your guessing record will be improved by the information he 
gives you, and the amount of improvement is, in a way, a measure of the 
information you receive from him. If he is a worse guesser than you are, you 
lose information if you follow his advice. This situation hints at a possibility 
of defining information quantitatively as the improvement of one’s chances of 
making the right guess. 

In any situation, information about something we already know is worthless 
as information. The keen competition among newspapers for “scoops” reflects 
this attitude. A “scoop” carries more information than a re-write story. Any kind 
of a message carries more or less information in it depending on the state of 
knowledge of the recipients. This much has been known ever since messages 
were invented. In our own day of precise formulation of problems, however, 
an altogether new way of measuring the amount of information in a message 1s 
being developed. 

In the example just cited a measure of the amount of information contained 
in a message is indicated in terms of how much such information is worth in a 
gambling situation. It is not necessary, however, to measure information in terms 
of its monetary value any more than it is necessary to depict chance events in 
terms of gambling situations. Such examples are often chosen because gambling 
has long served as a link between commonplace situations and sophisticated 
probabilistic arguments. There is more to the mathematical theory of informa- 
tion than a computation of how much we are willing to pay for the privilege of 
cheating in games or how the novelty of stories is reflected in the circulation of 
the newspapers that print them. 


The Mathematical Theory of Information 


4 eR MATHEMATICAL theory of information was born among communication 
engineers and is commanding ever greater attention among mathematically 


inclined biologists and semanticists. The reason for this increasing interest lies, 
I think, in the fact that the mathematical theory of information has been recog- 
nized as another successful instance of making precise and quantitative an ex- 
tremely important concept which had been talked about only vaguely before. 
I believe that the notion of the ‘quantity of information” is a Big Idea in 
science, similar in scope to the precise definition of ‘the amount of matter’ 
as registered on a balance or the “amount of energy” as derived from poten- 
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tials, velocities, and heat, or the “amount of entropy” as derived from the proba- 
bilities of the states of a system. The vast importance of this new big idea is in 
its potential applications to the fundamental biological and general semantics 
problems. We will touch on some of these below. Let us first take a closer look 
at some basic notions contained in the definition of the “amount of information.” 

As Warren Weaver has remarked,’ the amount of information in your mes- 
sage is related not to what you are saying but to what you could say. This relation 
links the amount of information in a message with the amount of pre-conceived 
knowledge about its content (recall the intuitive relation between the amount of 
information and how much we already know or can guess). 

Let us suppose that all you can say is “yes’’ and ‘no’ (in other words, you 
are as either-or-ish as you can possibly be). Then all you are ever expected to 
say is ‘yes’ or ‘‘no,” so that one already has a 50% “‘knowledge’’ of your poten- 
tial pronouncements. Thus, if you are entirely two-valued, you cannot give as 
much information in your one-word speeches as you could if you were “multi- 
valued.” If you selected your messages from ten possible ones, all equally likely, 
then one could hope to guess what you are going to say only once in ten times, 
instead of every other time, and your information giving capacity would be con- 
siderably increased. 

The “canned” messages offered by Western Union (birthday greetings, etc.) 
carry far less information (and therefore are cheaper to send) than individually 
composed messages, because there are far fewer canned messages to choose from. 

In order to define the amount of information in a message, then, we must 
know the total number of messages in the repertoire of the source from which 
the message is chosen. Let us take a concrete case. 

For simplicity, we will assume that all messages are in code and consist of 
combinations of two signals “1” and “0” (just as all Morse Code messages 
are combinations of two signals “dit” and “‘dah’’). We ask: how many different 
messages can we send? Obviously if the length of the message is unlimited, we 
can send an unlimited number of messages. Let us, therefore, consider only 
messages of a certain length, say » signals long. We can easily see that there 
can be exactly 2" distinct messages » signals long. This follows, because we have 
2 choices for the first signal (‘‘1" or 0’), 2 for the second, which makes 2? 
= 4 choices for a message of two signals. To each of these, we can again add 
either of two signals to make a message three signals long, etc., so that to make 
a message m signals long, we have 2X 2X 2... 2 a product of m 2’s or 2" 
choices. 


Therefore if you know that a certain message is m “binary” signals long,* 
you know you have one chance in 2" to guess its contents exactly, provided all 


* Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communica- 
tion (Urbana 1949). 


* That is, where each signal can take one of two possible values. 
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the messages are equally likely. We could therefore take the number 2" as a 
measure of the amount of information such a message carries. But we don’t 
have to take 2". We can take some other number derived from 2°", if it is more 
convenient to do so. The choice of a quantity with which we measure some- 
thing is not unique. For example, to measure the “‘size’” of a circle, we can 
take its diameter, but we are equally justified in taking its area, which is a 
quantity derived from the diameter in a certain way.5 There is good reason for 
taking as the measure of the amount of information, carried by a message n 
binary signals long, not 2" (which is the reciprocal of the probability of guessing 
it, or, if you like, the “unlikelihood” of guessing it) but the /ogarithm of that 
number. 


[I YOU REMEMBER your high school algebra, you will recall that the logarithm 
of a number is the power to which a certain fixed number, called the ‘‘base” 
must be raised to get that number. If we conveniently take 2 as our base, then 
log,(2") (read “the logarithm to the base two of two to the n-th) is just 7.® 
Thus, by the convention we have just established, a message n binary signals 
long contains “binary units’ of information, or one binary unit per signal. 
This binary unit is called a “‘bit’’ for short. Now we see the advantage of taking 
the logarithm of the unlikelihood of guessing (2") for our measure, since we 
can now say that a message twice as long (one 2” signals long) will contain 
just twice the amount of information. This is a very convenient way of talking.” 

It may have occurred to the reader that we have gone around in a circle. 
Would it not have been simpler to skip the argument about “probabilities of 
guessing” altogether and start out by a “‘natural” definition of the amount of 
information as simply a number proportional to the length of the message? 

It would, if we confined ourselves to messages from a single source. How- 
ever the interesting part of information theory deals with determining the 
amount of information in a message in terms of the character of its source, not 
merely in terms of its length (it isn’t what you say; it’s what you could say). 
It is the amount of information per signal that we are interested in, in other 


* A mathematician would say, ‘““The area is a certain function of the diameter.’’ 

* Obviously » is the power to which 2 must be raised to get 2°. 

"To resort to our analogy again, suppose we knew nothing about areas and described 
the sizes of circles by their diameters. If we had to paint circles for a living, we would 
soon discover that to paint a circle “twice the size’ would require ‘four times as much 
paint, one “thrice the size” would require nine times as much paint, etc. This would be 
somewhat confusing until some genius suggested to. take the square of the diameter as the 
size measure (a semantic reform). Now the amount of paint would be strictly proportional 
to the (newly defined) size of the circle. Similarly, if we take the logarithm of the 
“unlikelihood of guessing’ as a measure of the amount of information in a message, instead 
of the unlikelihood itself, the amount of information will be proportional to the length 
of the message. 
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words, the rate at which information is coming at us as we are receiving the 
message. This rate is one bit per signal in the case of a source with two equally 
likely signals. Where there are more signals in the source, and especially where 
the signals are not equally likely or where they are not independent, the amount 
of information per signal is not nearly so easy to compute. For this purpose, 
the “round about” definition is necessary. Furthermore, the “round about” 
definition points up the connection between information theory and the possi- 
bility of mathematicizing psychological and semantic concepts, as we shall see. 

Let us again suppose that we speak a language composed of two binary 
signals “1” and “0.” But let us now suppose that the ‘‘1” occurs far more fre- 
quently than the “0.” Such is actually the case with the symbols of the languages 
we ordinarily use. For example, good English can be written with some 30 
symbols (the 26 letters, a “space” and some punctuation marks). We can say 
definitely that some signals occur in English far more frequently than others. 
Or suppose that “1” and “0” are signals given out by a machine which is 
inspecting mass-produced parts, where ‘“1’’ means “O.K.” and “0” means 
“reject.”” If on the average only one item in a hundred is defective, then the “1” 
will register ninety-nine times more frequently than the “0.” How much in- 
formation is now contained in a message n units long? 

In view of what we said about the meaning of information, we must con- 


clude that in this case the amount of information contained in a message » units 


long must be less than » bits, because we already have a good chance of guess- 
ing what a message will say. If » is, say, 10, we have better than nine chances 
out of ten to guess the message if we guess it to be all ‘‘1's.”’ Since the message 
does not add as much to our knowledge as it would if the signals were equally 
likely, we must conclude that it carries less than » bits of information. But how 
much less? 

Suppose a message ” units long has n, “1's” and n, “0's,” so that n, + 
n, =n. What is the probability of occurrence of such a message? If the occur- 
rence of one signal does not influence that of another, it doesn’t matter in what 
order the signals occur. Since in our example the probability of a “1” is .99, 
and that of a ‘0’ is .01, the probability of 7, “1's” and m, “0's” arranged in a 
particular way (that is, the probability of a particular message) will be (.99)™ 
(.01)"s. The logarithm of the reciprocal of this number to the base 2, as we 
have agreed, will be a measure of the amount of information in such a message. 
This logarithm is equa! to — , log,(.99) — m, log,(.01). 

Now we have the amount of information in a particular message with n, 
“1's” and n, “0's.” But we don’t want to measure the information of particular 
messages. We want to measure the information of an average message n signals 
long coming from the source we have described. We will get this average if we 
substitute for 7, and m, their average values, averaged over a great many mes- 
sages coming from the source. Since the frequencies of the ‘‘1’s’” and the “0's” 
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are in the ratio of 99 to 1, it follows that the average value of mn, will be 99 
times that of ,. Furthermore, , + , must equal ”. Therefore n, = .99n and 
nm, = .01\n on the average. Then the amount of information in an average 
message m signals long will be — .99n log,(.99) — .01n log,(.01). If we 
wish to express the amount of information per signal, we divide by m and get 
—.99 log (.99) — .01 log (.01). If we calculate this number, we find it to be 
equal to about .11 bits or only one-ninth of what it would be if the “O.K.” and 
“reject” signals were equally likely. 

The method here described can be extended to compute the amount of 
information per signal from any source in which the occurrence of one signal 
does not influence the occurrence or non-occurrence of another. If the source 
has a repertoire of signals numbered 1 to N, and if they occur with relative 
frequencies (probabilities) p,, p, . . . Px, then the amount of information per 
signal, usually denoted by H, is expressed in the following formula: 


H = — p, log p, — p, log p, — p, log p, . . . — pw log px. 


In the example we solved there were only two signals, whose p, and p, were 
respectively .99 and .01. 


Applications to Technological Communication Theory 


S’ FAR we really did nothing but define terms and draw consequences from 


our definitions. We said nothing concrete about why we should want to 
make these particular definitions or draw these particular consequences. We did 
mention the looming importance of the information concept in semantics, psy- 
chology, and biology, but to some one who encounters this concept for the first 
time, the connection between it and what is generally thought to be the subject 
matter of biology, etc., is anything but clear. 


It is not easy to make such connections clear. In fact, the strenuous work 
of highly skilled specialists goes almost entirely into uncovering such connec- 
tions. They cannot be therefore obvious or intuitively evident or even easy to 
understand when explained. All we can do within the scope of this article is 
give hints about the sort of reasoning which leads to uncovering the possibilities 
of applying the quantification of information to several scientific fields. 

The first step in solving a problem is to state it. The statement usually in- 
volves a descriptior. of an existing state and a desirable state of affairs where 
the factors involved in the discrepancy are explicitly pointed out. The success 
with which any problem is solved depends to a great extent on the clarity with 
which it is stated. In fact, the solution of the problem is, in a sense, a clarifica- 
tion (or concretization) of the objectives. Take the problem of curing disease. 
For ages, it had been implicitly stated thus: 
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A is sick. 
This is bad. 
Let us find ways to make A well. 


Vague statements lead to vague methods, where success is erratic and ques- 
tionable. With the classification of diseases (as initiated, say, by Hippocrates), 
the problem is re-stated: 


A has a fever. 
This is bad. 
Let us look for ways to rid A of fever. 


Here there is more promise of success, because the events which make up sick- 
ness are somewhat extensionalized. Still further extensionalization appears with 
the discovery of events concomitant with the symptoms, for example the pres- 
ence of micro-organisms. Now the problem is 


A is infected with tuberculosis bacilli. 
They make A sick. 
Let us find ways to get rid of the bacilli. 


Further extensionalization could be, for example, a description of the bio- 
chemical processes characteristic of the tuberculosis bacilli which interfere with 
A’s biochemical processes, etc. The more a given problem is extensionalized, the 
greater promise there is in finding a solution. 


The problems of communication hygiene are now assuming an importance 


equal to those of physiological hygiene. A naive statement of a communication 
problem dates back to antiquity. 


A talks to B. 
B does not understand A. 
Let us explain to B what A means. 


However “attempts to explain” themselves depend on the proper functioning 
of the communication process. If this process is not understood, attempts to 
explain cannot be expected to have more success than the original attempt to 
communicate. The first steps in communication hygiene are therefore aimed at 
the understanding of the communication process. Hence the emergence of 
communication science. 

In examining instances of ‘‘failure to understand,” we see that it can occur 
on different levels. A most obvious cause of such failure can be laid to the 
imperfect transmission of signals. B can fail to understand A simply because 
A talks with a heavy accent, or is a small child who has not learned to pronounce 
the words clearly or is talking over a telephone with a bad connection or over a 
radio with too much static. 
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Communication problems on this level may deal with acoustics or electronics 
but also with physiological functions such as hearing and sight and their psycho- 
logical correlates, the perception of “‘gestalts’’ and recognition. Obviously no 
transmission and no reception of any signal is perfect. An important class of 
questions in communication theory concerns with the thresholds of intelligibility. 
One wishes to know, for example, how bad static has to be before it begins 
seriously to interfere with the transmission of spoken information over a radio 
channel of given characteristics. Evidently both the characteristics of trans- 
mission and reception and those of the subject matter broadcast are important 
in the problem. Information theory provides a measure of these variables. It 
provides, for example, a measure of the complexity involved in “fidelity” of 
reproduction.® It provides a method of estimating quantitatively the effects of 
“noise” on reception, since the effects of noise are equivalent to loss of informa- 
tion. It provides theoretical limits for the performance of a channel of given 
characteristics, somewhat in the way thermodynamics indicates the limits of 
efficiency of a heat engine. 


Applications to Semantics 


The semanticist is usually unconcerned with these purely “technical” prob- 
lems of communication and leaves them to the communication engineer. Division 
of labor is entirely proper in approaching any complicated set of problems; but 
it is a mistake to take too seriously the dichotomies we set up in parceling out 


the jobs. These dichotomies lead not only to the persistence of elementalistic 
notions but also delay the discovery of analogous, methods fruitful in the various 
aspects of the problem. It may be true that the technical problems of long range 
communication (radio, television, etc.) can be treated entirely independently 
of the semantic content of the messages or the semantic reactions of the audience. 
But it may also be true that the methods involved in treating those problems 
(for example, the mathematical theory of information) can be applied in the 
seemingly different context of the events which interest general semanticists, 
psychologists, and others. 

Such possibilities are already apparent. To point them out, we will examine 
a little more closely the formula given above which describes the amount of 
information in terms of the repertoire of the source and the relative frequencies 
of the signals employed. As we said, the formula holds if the signals are inde- 
pendent of each other. But what if this is not the case? What if the occurrence 
of one signal influences the chances of the occurrence of another? This is cer- 
tainly true in the case where the source is the English alphabet, and the messages 
consist of English sentences. In this case, it is almost certain that the letter g will 

*If a musical tone has few overtones, it can be mathematically described by a few 


parameters (the relative intensities of its overtones) and thus carries less information and 
is easier to reproduce with fidelity than a tone which carries many overtones. 
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be followed by a # (barring comparatively rare words like Iraqi). It is practically 
impossible for the letter z to be followed by a consonant, etc. 

Under these conditions, the formula for the amount of information per sig- 
nal must be modified. We will not go into the details of this modification here. 
We will only point out that the problem of computing the amount of informa- 
tion under various conditions of communication has led to a number of im- 
portant concepts, in terms of which the technical problems of communication 
are described. One important characteristic of those concepts is that they are 
often stated in mathematical language and therefore the techniques of mathe- 
matical deduction can be applied to them. This citcumstance makes the problems 
of communication much more explicit and the solutions to such problems easier 
to find. 

Another important advantage of those concepts is that they give hints on 
how the precise methods of dealing with communication in the (comparatively) 
uncomplicated area of technology could be extended to the more complicated 
areas of psychology, semantics, and general semantics. For example, the modi- 
fication of our formula on page 252 to take into account the interdependence 
of signals gives rise to the concept of ‘redundancy’ of the source output, and 
if the source is an entire language, this concept can be extended to mean the 
“redundancy” of a language. In information theory, redundancy is a measure 
of the interdependence of the signals. But redundancy has also an intuitive 
component, and the precise definition makes possible the extensionalization of 
this intuitive component. 


The connection between the precise and the intuitive notions of redundancy 
is dramatically illustrated in C. Shannon’s monograph, The Mathematical Theory 
of Communication. Suppose we put all the letters of the English alphabet into 
a hat in equal amounts and pull them out one by one “at random.” What would 
they spell? Here is a sample of such a “language.” 


XFOML RXKHRJFFJUJ ZLPWCFWKCY] FFJEYVKCQSGHYD 

QPAAMKBZAACIBZLHJQD 

In any one’s estimation this sample does not ‘‘make sense.” Now suppose 
that instead of putting the letters into the hat in equal numbers, we put them 
in proportionally to the frequency with which they actually occur in English ® 
and again pull them out at random. The resulting sample now looks like this. 


OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI 
ALHENHTTPA OOBTTVA NAH BRL. 


This still doesn’t make ‘‘sense.’’ But there is no question that it makes some- 
what more “sense” than before. It /ooks more like English. It does not bristle 


* The relative frequencies of letters in English has been made familiar by the printer's 
clché,etaoinshrdlu... 
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quite so much with J’s and Z’s. Somehow we feel that a ‘‘gradation” of sense 
can be established even among random samples of letters. The feeling is strength- 
ened when we perform the next experiment. We now put into our hat not single 
letters but pairs, taking care of keeping their numbers proportional to their 
actual occurrence in English. Now we get the following sample. 


ON IE ANTSOUTINYS ARE T INCTORE ST BE § DEAMY ACHIN 
D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY 
TOBE SEACE CTISBE. 


Now there is no doubt that we are approaching “English.” The sample 
contains two or three real English words and several “near-words’” like DEAMY 
and TEASONARE. A sample of “triples’’ looks even better. 


IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID 
PONDENOME OF DEMONSTURES OF THE REPTAGIN IS 
REGOACTION OF CRE. 


Perhaps this sample reminds us of Jabberwocky. It should, because Jabber- 
wocky too is an “approximation” to English, a very good approximation that 
almost makes real sense. 

What can be done with letters can be done with words. Compare, for 
example, the sample of randomly selected words, 


REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME 
CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE 
TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE 
MESSAGE HAD BE THESE 


with a sample of randomly selected pairs of words, 


THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH 
WRITER THAT THE CHARACTER OF THIS POINT IS THERE- 
FORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME 
OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED, 


and see how much more “‘sense’’ there is in the second, although it still doesn’t 
“mean” anything.?° 

These “approximations” to English are examples of how the intuitive feeling 
that one piece of gibberish is somehow closer to the English language than 
another is a reflection of a precisely and quantitatively defined situation.11 The 
situation has to do with the characteristic linkages used in English. The extent 
of these linkages is also a measure of the redundancy of the English language. 


* All the foregoing samples have been taken from C. Shannon, op. cit. 


“ Every one has experienced hearing a conversation in which no words could be dis- 
tinguished but where one could be reasonably sure that the language spoken was or was 
not English. 


256 





SUMMER 1953 WHAT IS INFORMATION? 


Redundancy can also be taken as a measure of the fraction of letters which can 
be randomly deleted from a reasonably long message without making the message 
unintelligible. FR EXMPLE WENTYIVE PRCET OF HE LTTERS I TIS 
SENTENCE HVEBEN DLETED AT RANM. The redundancy of English is 
said to be over 50%. 

Redundancy is thus both a linguistic and a mathematical term. The more 
redundancy there is in a source, the more tolerance there is for noise and other 
imperfections of transmission without serious interference with intelligibility. 
The importance of the redundancy concept in crytography is likewise apparent. 
The more redundant the source of messages, the easier it is to break a code. 
In stenography redundancy is a measure of the amount of drastic abbreviation 
that can be introduced without danger of confusion. All these linguistic matters 
are contiguous to the field of interest of semanticists and of general semanticists. 
A manner of expression full of clichés is, of course, high in redundancy. It turns 
out in the mathematical theory of information that messages from a cliché-ridden 
source (such as the oratorical repertoire of a run-of-the-mill politician) are also 
poor in information. This is something semanticists have known all along, but 
it is gratifying to have this knowledge formulated precisely. Precisely formulated 
knowledge is valuable not only for its own sake but also as a jumping-off place 
to new knowledge. 


Connection with Physics 


The mathematicians who derived the formula for the amount of information 
soon noticed that it looks exactly like the formula for entropy in statistical me- 
chanics. Mathematicians are often excited by such analogies. There is an important 
difference between a mathematical analogy and an ordinary “metaphorical” one. 
Arguments based on ordinary analogies are seldom conclusive. For example, 
just because it is true that natural selection benefits the survival of a species, it 
does not follow that economic competition is indispensable for the vigor of a 
nation. Nor is the justification of capital punishment convincing on the basis of 
its analogy with surgery applied to a diseased part of the body. A mathematical 
analogy, however, is a quite different matter. Such analogy is evidence of similar 
structure in two or more classes of events, and a great deal can be deduced from 
such similarity. For example, because both electrical and mechanical oscillators 
can be described by the same kinds of equations, it follows that a great deal of 
reasoning which applies to one applies also to the other. Since the analogy 
between information and entropy is a mathematical analogy, it too may be 
symptomatic of a structure similarity in the events involved in the determination 
of physical entropy and those involved in the measurement of information. It 
seems worthwhile, therefore, to look at this analogy more closely. 

The concept of entropy was first introduced into thermodynamics as a measure 
of the unavailability of heat energy for transformation into useful work. The 
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principle of conservation of energy (the First Law of thermodynamics) says that 
a given amount of heat is equivalent in terms of its energy content to a given 
amount of work. For example, the heat energy of a slice of bread (about 100 
“large” calories) is theoretically convertible into about 300,000 foot pounds of 
work. But this equivalence of heat and work does not mean that we can take 
heat from any source and convert it all into work. If this were true, there would 
be no need of fuels. We could take the practically inexhaustible heat of the 
oceans and drive all our machinery with it, with only a slight cooling of the 
oceans as the result. But this cannot be done. In any engine, where heat is con- 
verted into work, this can be done only if heat is allowed to flow from a source 
at higher temperature to a sink at lower temperature. Thus a difference in tem- 
perature is indispensable for turning heat into work. The boiler and the cooler 
of a steam engine illustrate this principle. The less the difference in temperatures, 
the smaller is the fraction of the amount of heat which can be transformed into 
work, i.e., the smaller the efficiency of the engine. 

Now entropy is, among other things, a measure of the equalization of tem- 
perature throughout a system. If the temperature is constant throughout a sys- 
tem, the entropy is greatest in it, and none of the heat is available for work. 


ie CLASSICAL thermodynamics, entropy was expressed in terms of the heat and 
the temperature of the system. With the advent of the kinetic theory of 
matter, an entirely new approach to thermodynamics was developed. Temperature 


and heat are now pictured in terms of the kinetic energy of the molecules com- 
prising the system, and entropy becomes a measure of the probability that the 
velocities of the molecules and other variables of a system’ are distributed in a 
certain way. The reason the entropy of a system is greatest when its temperature 
is constant throughout is because this distribution of temperature is the most 
probable. Increase of entropy was thus interpreted as the passage of a system 
from less probable to more probable states. 

A similar process occurs when we shuffle a deck of cards. If we start with 
an orderly arrangement, say the cards of each suit following each other accord- 
ing to their value, the shuffling will tend to make the arrangement disorderly. 
But if we start with a disorderly arrangement, it is very unlikely that through 
shuffling the cards will come into an orderly one. This is so, because there are 
many more “‘disorderly” than orderly arrangements, and so the disorderly state 
of a deck of cards is more probable. 

Thus, the “amount of order’’ is connected with probabilistic concepts and 
through them with entropy (the less order, the more entropy). But it is also 
connected with the ‘‘amount of information.” For example, far less information 
is required to describe an orderly arrangement of the cards than a disorderly one. 
If I say “Starting with ace, deuce, etc., to king; hearts, diamonds, clubs, spades,” 
I have determined the position of every card in the deck. But to describe an 
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arbitrary random arrangement, I have to specify every one of the fifty-two cards. 

It is through these notions of probability, order, and disorder that entropy is 
related to information. The formal equivalence of their mathematical expressions 
indicates that both concepts describe similarly structured events. 


Applications to Biology 


RB ENTROPY and information can be defined in terms of the same kinds 
of variables, namely probabilities of events. Now entropy plays an important 
part in chemistry and in biochemistry. For example, the knowledge of the 
entropies of two states of a system indicates whether the system can pass from 
one state to the other spontaneously, say whether a certain chemical reaction 
can take place without outside interference. The interesting thing about chemical 
reactions in living organisms is that many of them are such that they do not 
ordinarily take place without interference. Such are, for example, the synthesis 
of sugars from water and carbon dioxide by green plants and the synthesis of 
complicated proteins from amino-acids by animals. In these reactions, the ordi- 
Mary processes (oxidation of sugars and the decomposition of proteins) are 
reversed. Therefore there must be interference. Early thinkers on this subject 
postulated the operations of ‘‘vital forces’ within living things which made these 
“up-hill” reactions possible. 

No evidence of any phenomenon explicitly violating known physical and 
chemical laws has ever been observed in any organism. True, ‘‘up-hill’’ reactions 
seem to contradict the law of thermodynamics which demands a continuous 
increase in entropy (the Second Law of thermodynamics), but there is nothing 
in the laws that says that it cannot be circumvented / cally. In other words, what 
living things seem to do is create little ‘‘islands of order” in themselves at the 
expense of increased disorder elsewhere. This is the meaning of Schroedinger's 
famous remark that “‘life feeds on negative entropy.’’ 1? 

Life, therefore, depends essentially on an ordering process, on fighting off 
the general trend toward chaos, which is always present in the non-living world. 
But to increase the order of anything means to make it describable with less 
information (less effort). And is this process not the very essence of knowledge, 
of science itself? Or of any behavior where complex skills are involved? When 
a chess genius plays a dozen games simultaneously from memory, or when a 
musician masters the intricate complexity of muscular movements which go 
into the rendition of a musical creation, or when a scientist weaves a mass of 
seemingly unrelated data into a monolithic theory, they are all contributing to 
the process of decreasing the “entropy” of a portion of the world, of making 
it more comprehensible with less effort. 

Organisms, geneticists tell us, evolve by suffering random genetic variations, 


which in the process of many generations are selected for their survival value. 
* E. Schroedinger, What Is Life? (Cambridge and New York 1945). 
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If these variations were independent of each other, nothing would ever come 
of evolution. God alone knows how many mutations it took to enable our pre- 
human ancestor to speak and to make him want to speak. Single mutations are 
improbable enough. But when many have to combine to give rise to some com- 
plex patterns of behavior, such as speech, the probability of their being so com- 
bined would be infinitesimally small, if the mutations were accumulated inde- 
pendently of each other. But they are not accumulated independently. Rather 
they seem to be “hoarded”’ like pieces of a jigsaw puzzle, so that in the process of 
accumulation gaps arise, which are later filled by the proper mutations. 

Thus evolution itself is an ordering process. Gross changes of structure, of 
physiology, and of behavior are made possible because structures, physiological 
processes, and behavior patterns are always being organized into assemblies. 
The promise which information theory holds for biology is the same that it 
holds for linguistics and semantics—the promise to make possible a precise 
language for talking about the structure of assemblies and the fundamental 
processes involving the emergence of order from chaos and chaos from order. 

Korzybski and others maintained that structure is the only content of knowl- 
edge. Korzybski also emphasized the “false-to-factness’” of the two-valued 
orientation. It is therefore futile to suppose that a portion of the world either 
is Or is not “structured.” It may be partially structured. Therefore a measure 
of structure or of the amount of organization is required. We have seen how 
through the entropy concept “amount of information’ can be equated to the 
“amount of disorder.” That is not to say that information is a carrier of dis- 
order. On the contrary, information is the carrier of order. What is meant 
by the equivalence is that the more disordered a portion of the world is, the 
more information is required to describe it completely, that is, to make it known. 
Thus the process of obtaining knowledge is quantitatively equated to the process 
of ordering portions of the world. 

Significantly, attempts are made to define the life process itself in terms of 
the creation of order. It has been proposed, for example, to define the amount 
of entropy in an organism as equal to the amount of entropy it would have if it 
were completely disorganized minus the amount of information necessary to 
construct it from its disorganized state. It follows that the more complex an 
organism is, the more ordered it is, the less entropy it contains. One also suspects 
from this definition that it is easier to construct a dead organism than a live one. 
But one also conjectures that the only difficulty in constructing an actual living 
thing is that an immense amount of information is required to do so. 

Living things, therefore, appear in the light of information theory as the 
carriers of knowledge (i.e., of structure). Man’s unique place in the universe is 
in that he not only carries this “physiological knowledge’ within him but has 
also developed a “second order knowledge,” a knowledge of what knowledge 
consists of and has thus added a new dimension to the life process. 
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RECENT CONTRIBUTIONS TO 
THE MATHEMATICAL THEORY 
OF COMMUNICATION 


WARREN WEAVER * 


THE GENERAL SETTING OF THE 
ANALYTICAL COMMUNICATION STUDIES 


Se WORD communication will be used here in a very broad sense to include 
all of the procedures by which one mind may affect another. This, of course, 
involves not only written and oral speech, but also music, the pictorial arts, the 
theater, the ballet, and in fact all human behavior. In some connections it may 
be desirable to use a still broader definition of communication, namely, one which 
would include the procedures by means of which one mechanism (say automatic 
equipment to track an airplane and to compute its probable future positions) 


Dr. Weaver is Research Director of the Rockefeller Foundation. The present paper 
is his contribution in The Mathematical Theory of Communication by Claude Shannon and 
Warren Weaver. It is reprinted by kind permission of the author and of the University of 
Illinois Press. 

The author states in a footnote that he is responsible for both the ideas and the form 
of the first and third sections of this paper. The middle section, “Communication Problems 
of Level A,” he presents as an interpretation of mathematical papers by Dr. Claude E. 
Shannon of the Bell Telephone Laboratories and gives the following historical account of 
the ideas involved: 

Dr. Shannon's work roots back, as von Neumann has pointed out, to Boltzmann's 
observation, in some of his work on statistical physics (1894), that entropy is related to 
‘missing information,” inasmuch as it is related to the number of alternatives which remain 
possible to a physical system after all the macroscopically observable information concerning 
it has been recorded. L. Szilard (Zeitschrift fuer Physik, Vol. 53, 1925) extended this idea 
to a general discussion of information in physics, and von Neumann (Mathematical Foun- 
dations of Quantum Mechanics, Berlin, 1932, Chapter V) treated information in quantum 
mechanics and particle physics. Dr. Shannon's work connects more directly with certain 
ideas developed some twenty years ago by H. Nyquist and R. V. L. Hartley, both of the 
Bell Laboratories; and Dr. Shannon has himself emphasized that communication theoty 
owes a great debt to Professor Norbert Wiener for much of its early philosophy. Professor 
Wiener, on the other hand, points out that Shannon's early work on switching and mathe- 
matical logic antedated his own interest in this field; and generously adds that Shannon 
certainly deserves credit for independent development of such fundamental aspects of the 
theory as the introduction of entropic ideas. Shannon has naturally been specially con- 
cerned to push the applications to engineering communication, while Wiener has been 
more concerned with biological application (central nervous system phenomena, etc.) 
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affects another mechanism (say a guided missile chasing this airplane). 

The language of this memorandum will often appear to refer to the special, 
but still very broad and important, field of the communication of speech; but 
practically everything said applies equally well to music of any sort, and to still 
or moving pictures, as in television. 


Three Levels of Communications Problems 


pea TO THE broad subject of communication, there seem to be problems 
at three levels. Thus it seems reasonable to ask serially: 

LEVEL A. How accurately can the symbols of communication be transmitted? 
(The technical problem.) 

LEVEL B. How precisely do the transmitted symbols convey the desired mean- 
ing? (The semantic problem.) 

LEVEL C. How effectively does the received meaning affect conduct in the 
desired way? (The effectiveness problem. ) 

The technical problems are concerned with the accuracy of transference from 
sender to receiver of sets of symbols (written speech), or of a continuously 
varying signal (telephonic or radio transmission of voice or music), or of a 
continuously varying two-dimensional pattern (television), etc. Mathematically, 
the first involves transmission of a finite set of discrete symbols, the second the 
transmission of one continuous function of time, and the third the transmission 
of many continuous functions of time or of one continuous function of time 
and of two space coordinates. 

The semantic problems are concerned with the identity, or satisfactorily close 
approximation, in the interpretation of meaning by the receiver, as compared 
with the intended meaning of the sender. This is a very deep and involved 
situation, even when one deals only with the relatively simpler problems of 
communicating through speech. 

One essential complication is illustrated by the remark that if Mr. X is 
suspected not to understand what Mr. Y says, then it is theoretically not pos- 
sible, by having Mr. Y do nothing but talk further with Mr. X, completely to 
clarify this situation in any finite time. If Mr. Y says “Do you now understand 
me?” and Mr. X says ‘Certainly I do,” this is not necessarily a certification that 
understanding has been achieved. It may just be that Mr. X did not understand 
the question. If this sounds silly, try it again as ‘Czy pan mnie rozumie?’’ with 
the answer “Hai wakkate imasu.” I think the basic difficulty ! is, at least in the 


*“When Pfungst (1911) demonstrated that the horses of Elberfeld, who were show- 
ing marvelous linguistic and mathematical ability, were merely reacting to movements of 
the trainer's head, Mr. Krall (1911), their owner, met the criticism in the most direct 
manner. He asked the horses whether they could see such small movements and in answer 
they spelled out an emphatic ‘No.’ Unfortunately, we cannot all be so sure that our ques- 
tions are understood or obtain such clear answers.” See K. S. Lashley, ‘Persistent Problems 
in the Evolution of Mind” in Quarterly Review of Biology, Vol. 24 (March 1949), p. 28. 
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restricted field of speech communication, reduced to a tolerable size (but never 
completely eliminated) by “explanations” which (a) are presumably never more 
than approximations to the ideas being explained, but which (b) are under- 
standable since they are phrased in a language which has previously been made 
reasonably clear by operational means. For example, it does not take long to 
make the symbol for ‘‘yes’’ in any language operationally understandable. 

The semantic problem has wide ramifications if one thinks of communication 
in general. Consider, for example, the meaning to a Russian of a U.S. newsreel 
picture. 

The effectiveness problems are concerned with the success with which the 
meaning conveyed to the receiver leads to the desired conduct on his part. It 
may seem at first glance undesirably narrow to imply that the purpose of all 
communication is to influence the conduct of the receiver. But with any reason- 
ably broad definition of conduct, it is clear that communication either affects 
conduct or is without any discernible and probable effect at all. 

The problem of effectiveness involves esthetic considerations in the case 
of the fine arts. In the case of speech, written or oral, it involves considerations 
which range all the way from the mere mechanics of style, through all the 
psychological and emotional aspects of propaganda theory, to those value judg- 
ments which are necessary to give useful meaning to the words ‘“‘success’’ and 
“desired” in the opening sentence of this section on effectiveness. 

The effectiveness problem is closely interrelated with the semantic problem, 
and overlaps it in a rather vague way; and there is in fact overlap between all 
of the suggested categories of problems. 


Comments 


. ¥ STATED, one would be inclined to think that Level A is a relatively super- 
ficial one, involving only the engineering details of good design of a com- 
munication system ; while B and C seem to contain most if 


not all of the 
philosophical content of the general problem of communication. 


The mathematical theory of the engineering aspects of communication, as 
developed chiefly by Claude Shannon at the Bell Telephone Laboratories, ad- 
mittedly applies in the first instance only to problem A, namely the technical 
problem of accuracy of transference of various types of signals from sender to 
receiver. But the theory has, I think, a deep significance which proves that the 
preceding paragraph is seriously inaccurate. Part of the significance of the new 
theory comes from the fact that levels B and C, above, can make use only of 
those signal accuracies which turn out to be possible when analyzed at level A. 
Thus any limitations discovered in the theory at Level A necessarily apply to 
levels B and C. But a larger part of the significance comes from the fact that 
the analysis at Level A discloses that this level overlaps the other levels more 
than one could possibly naively suspect. Thus the theory of Level A is, at least 


263 





ETC.: A REVIEW OF GENERAL SEMANTICS VOL. X, NO. 4 


to a significant degree, also a theory of levels B and C. I hope that the succeeding 
parts of this memorandum will illuminate and justify these remarks. 


COMMUNICATION PROBLEMS AT LEVEL A 


A Communication System and Its Problems 


HE COMMUNICATION system considered may be symbolically represented as 
follows: 
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FIGURE 1 


The information source selects a desired message out of a set of possible 


messages (this is a particularly important remark, which requires considerable 
explanation later). The selected message may consist of written or spoken 
words, or of pictures, music, etc. 

The transmitter changes this message into the signal which is actually sent 
over the communication channel from the transmitter to the receiver. In the 
case of telephony, the channel is a wire, the signal a varying electric current on 
this wire; the transmitter is the set of devices (telephone transmitter, etc.) 
which change the sound pressure of the voice into the varying electrical current. 
In telegraphy, the transmitter codes written words into sequences of interrupted 
currents of varying lengths (dots, dashes, spaces). In oral speech, the informa- 
tion source is the brain, the transmitter is the voice mechanism producing the 
varying sound pressure (the signal) which is transmitted through the air (the 
channel). In radio, the channel is simply space (or the ether, if any one still 
prefers that antiquated and misleading word), and the signal is the electro- 
magnetic wave which is transmitted. 

The receiver is a sort of inverse transmitter, changing the transmitted signal 
back into a message, and handing this message on to the destination. When I 
talk to you, my brain is the information source, yours the destination; my vocal 


system is the transmitter, and your ear and the associated eighth nerve is the 
receiver. 
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In the process of being transmitted, it is unfortunately characteristic that 
certain things are added to the signal which were not intended by the informa- 
tion source. These unwanted additions may be distortions of sound (in tele- 
phony, for example) or static (in radio), or distortions in shape or shading of 
picture (television), or errors in transmission (telegraphy or facsimile), etc. 
All of these changes in the transmitted signal are called nozsse. 

The kind of questions which one seeks to ask concerning such a communica- 
tion system are: 

a. How does one measure amount of information? 

b. How does one measure the capacity of a communication channel? 

c. The action of the transmitter in changing the message into the signal 
often involves a coding process. What are the characteristics of an efficient 
coding process? And when the coding is as efficient as possible, at what rate 
can the channel convey information? 

d. What are the general characteristics of noise? How does noise affect 
the accuracy of the message finally received at the destination? How can one 
minimize the undesirable effects of noise, and to what extent can they be 
eliminated? 

e. If the signal being transmitted is continuous (as in oral speech or music) 
rather than being formed in discrete symbols (as in written speech, telegraphy, 
etc.) how does this fact affect the problem? 

We will now state, without any proofs and with a minimum of mathe- 
matical terminology, the main results which Shannon has obtained. 


Information 


HE WORD information, in this theory, is used in a special sense that must not 
be confused with its ordinary usage. In particular, information must not be 
confused with meaning. 


In fact, two messages, one of which is heavily loaded with meaning and 


the other of which is pure nonsense, can be exactly equivalent, from the present 
viewpoint, as regards information. It is this, undoubtedly, that Shannon means 
when he says that ‘the semantic aspects of communication are irrelevant to the 


engineering aspects.’ But this does not mean that the engineering aspects are 
necessarily irrelevant to the semantic aspects. 

To be sure, this word information in communication theory relates not so 
much to what you do say, as to what you could say. That is, information is a 
measure of one’s freedom of choice when one selects a message. If one is con- 
fronted with a very elementary situation where he has to choose one of two 
alternative messages, then it is arbitrarily said that the information, associated 
with this situation, is unity. Note that it is misleading (although often con- 
venient) to say that one or the other message conveys unit information. The 
concept of information applies not to the individual messages (as the concept 
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of meaning would), but rather to the situation as a whole, the unit information 
indicating that in this situation one has an amount of freedom of choice, in 
selecting a message, which it is convenient to regard as a standard or unit amount. 

The two messages between which one must choose in such a selection can 
be anything one likes. One might be the text of the King James Version of the 
Bible, and the other might be “Yes.’” The transmitter might code these two 
messages so that ‘‘zero”’ is the signal for the first, and “one” the signal for the 
second; or so that a closed circuit (current flowing) is the signal for the first, 
and an open circuit (no current flowing) the signal for the second. Thus the 
two positions, closed and open, of a simple relay, might correspond to the two 
messages. 

To be somewhat more definite, the amount of information is defined in the 
simplest cases, to be measured by the logarithm of the number of available 
choices. It being convenient to use logarithms? to the base 2, rather than 
common or Briggs’ logarithm to the base 10, the information, when there are 
only two choices, is proportional to the logarithm of 2 to the base 2. But this 
is unity; so that a two-choice situation is characterized by information unity, as 
has already been stated above. This unit of information is called a “bit,” this 
word, first suggested by John W. Tukey, being a condensation of “binary digit.” 
When numbers are expressed in the binary system there are only two digits, 
namely 0 and 1; just as ten digits, 0 to 9 inclusive, are used in the decimal 
number system which employs 10 as a base. Zero and one may be taken sym- 
bolically to represent any two choices, as noted above; so that “binary digit’’ 
or ‘‘bit’’ is natural to associate with the two-choice situation which has unit 
information. 

If one has available say 16 alternative messages among which he is equally 
free to choose, then since 16 = 2* so that log,16 = 4, one says that this situa- 
tion is characterized by 4 bits of information. 

It doubtless seems queer, when one first meets it, that information is de- 
fined as the /ogarithm of the number of choices. But in the unfolding of the 
theory, it becomes more and more obvious that logarithmic measures are in 
fact the natural ones. At the moment, only one indication of this will be given. 
It was mentioned above that one simple on-or-off relay, with its two positions 
labeled, say, 0 and 1 respectively, can handle a unit information situation, in 
which there are but two message choices. If one relay can handle unit information, 
how much can be handled by say three relays? It seems very reasonable to want 
to say that three relays could handle three times as much information as one. 
And this indeed is the way it works out ir one uses the logarithmic definition of 
information. For three relays are capable of responding to 2% or 8 choices, 
which symbolically might be written as 000, 001, 011, 010, 100 110, 101, 111, 
in the first of which all three relays are open, and in the last of which all three 


* When m* = y, then x is said to be the logarithm of y to the base m. 
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relays are closed. And the logarithm to the base 2 of 2° is 3, so that the 
logarithmic measure assigns three units of information to this situation, just 
as one would wish. Similarly, doubling the available time squares the number 
of possible messages, and doubles the logarithm; and hence doubles the in- 


formation if it is measured logarithmically. 


= REMARKS thus far relate to artificially simple situations where the informa- 
tion source is free to choose only between several definite messages—like a 
man picking out one of a set of standard birthday greeting telegrams. A more 
natural and more important situation is that in which the information source 
makes a sequence of choices from some set of elementary symbols, the selected 


sequence then forming the message. Thus a man may pick out one word after 


another, these individually selected words then adding up to form the message. 


At this point an important consideration which hes been in the background, 
so far, comes to the front for major attention. Namely, the role which proba- 
bility plays in the generation of the message. For as the successive symbols are 
chosen, these choices are, at least from the point of view of the communication 
system, governed by probabilities ; and in fact by probabilities which are not 
independent, but which, at any state of the process, depend upon the preceding 
choices. Thus, if we are concerned with English speech, and if the last symbol 
chosen is “the,” then the probability that the next word be an article, or a verb 
form other than a verbal, is very small. This probabilistic influence stretches 
over more than two words, in fact. After the three words ‘in the event’’ the 
probability for “that” as the next word is fairly high, and for ‘elephant’ as 
the next word is very low. 

That there are probabilities which exert a certain degree of control over the 
English language also becomes obvious if one thinks, for example, of the fact 
that in our language the dictionary contains no words whatsoever in which the 
initial letter j is followed by 4, c, d, f, g, j, 4, 1, 9, 7, t, v, w, x, or z; so that 
the probability is actually zero “> ut an initial 7 be followed by any of wine letters. 
Similarly, anyone would agree that the probability is low for such a sequence of 
words as “Constantinople fishing nasty pink.’’ Incidentally, it is low, but not 
zero; for it is perfectly possible to think of a passage in which one sentence 
closes with “Constantinople fishing,’ and the next begins with ‘Nasty pink.” 
And we might observe in passing that the unlikely four-word sequence under 
discussion Aas occurred in a single good English sentence, namely the one above. 

A system which produces a sequence of symbols (which may, of course, be 
letters or musical notes, say, rather than words) according to certain probabilities 
is called a stochastic process, and the special case of a stochastic process in which 
the probabilities depend on the previous events, is called a Markoff process or a 
Markoff chain. Of the Markoff processes which might conceivably generate 
messages, there is a special class which is of primary importance for communica- 
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tion theory, these being what are called ergodic processes. The analytical details 
here are complicated and the reasoning so deep and involved that it has taken 
some of the best efforts of the best mathematicians to create the associated theory ; 
but the rough nature of an ergodic process is easy to understand. It is one which 
produces a sequence of symbols which would be a poll-taker’s dream, because 
any reasonably large sample tends to be representative of the sequence as a whole. 
Suppose that two persons choose samples in different ways, and study what 
trends their statistical properties would show as the samples become larger. 
If the situation is ergodic, then those two persons, however they may have 
chosen their samples, agree in their estimates of the properties of the whole. 
Ergodic systems, in other words, exhibit a particularly safe and comforting sort 
of statistical regularity. 


N” LET US return to the idea of information. When we have an informa- 
tion source which is producing a message by successively selecting discrete 
symbols (letters, words, musical notes, spots of a certain size, etc.) the probability 
of choice of the various symbols at one stage of the process being dependent on 
the previous choices (i.e., a Markoff process), what about the information asso- 
ciated with this procedure? 


The quantity which uniquely meets the natural requirements that one sets 
up for “information” turns out to be exactly that which is known in thermo- 
dynamics as entropy. It is expressed in terms of the various probabilities in- 


volved—those of getting to certain stages in the process of forming messages 
and the probabilities that, when in those stages, certain symbols be chosen next. 
The formula, moreover, involves the /ogarithm of probabilities, so that it is a 
natural generalization of the logarithmic measure spoken of above in connection 
with simple cases. 

To those who have studied the physical sciences, it is most significant that 
an entropy-like expression appears in the theory as a measure of information. 
Introduced by Clausius nearly one hundred years ago, closely associated with the 
name of Boltzman, and given deep meaning by Gibbs in his classic work on 
statistical mechanics, entropy has become so basic and pervasive a concept that 
Eddington remarks ‘“The law that entropy always increases—the second law of 
thermodynamics—holds, ‘I think, the supreme position among the laws of 
Nature.” 

In the physical sciences, the entropy associated with a situation is a measure 
of the degree of randomness, or of “‘shuffled-ness”’ if you will, in the situation ; 
and the tendency of physical systems to become less and less organized, to be- 
come more and more perfectly shuffled, is so basic that Eddington argues that 
it is primarily this tendency which gives time its arrow—which would reveal 
to us, for example, whether a movie of the physical world is being run forward 
or backward. 
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Thus when one meets the concept of entropy in communication theory he 
has a right to be rather excited—a right to suspect that one has hold of some- 
thing that may turn out to be basic and important. That information be measured 
by entropy is, after all, natural when we remember that information, in com- 
munication theory, is associated with the amount of freedom of choice we have 
in constructing messages. Thus for a communication source one can say, just 
as we would also say it of a thermodynamic ensemble, “This situation is highly 
organized, it is not characterized by a large degree of randomness or of choice— 
that is to say, the information (or the entropy) is low.’ We will return to this 
point later, for unless I am quite mistaken, it is an important aspect of the more 
general significance of this theory. 


_ CALCULATED the entropy (or the information, or the freedom of 
choice) of a certain information source, one can compare this to the 
maximum value this entropy could have, subject only to the condition that the 
source continue to employ the same symbols. The ratio of the actual to the 
maximum entropy is called the relative entropy of the source. If the relative 
entropy of a certain source is, say .8, this roughly means that this source is, in 
its choice of symbols to form a message, about 80 per cent as free as it could 
possibly be with these same symbols. One minus the relative entropy is called 
the redundancy. This is the fraction of the structure of the message which is 


determined not by the free choice of the sender, but rather by the accepted sta- 
tistical rules governing the use of the symbols in question. It is sensibly called 
redundancy, for this fraction of the message is in fact redundant in something 
close to the ordinary sense; that is to say, this fraction of the message is un- 
necessary (and hence repetitive or redundant) in the sense that if it were 
missing the message would still be essentially complete, or at least could be 
completed. 


It is most interesting to note that the redundancy of English is just about 
50 per cent,® so that about half of the letters or words we choose in writing 
or speaking are under our free choice, and about half (although we are not 
ordinarily aware of it) are really controlled by the statistical structure of the 
language. Apart from more serious implications, which again we will postpone 
to our final discussion, it is interesting to note that a language must have at 
least 50 per cent of real freedom (or relative entropy) in the choice of letters 
if one is to be able to construct satisfactory crossword puzzles. If it has com- 
plete freedom, then every array of letters is a crossword puzzle. If it had only 
20 per cent of freedom, then it would be impossible to construct crossword 
puzzles in such complexity and number as would make the game popular. 


*The 50 per cent estimate accounts only for statistical structure out to about eight 
letters, so that the ultimate value is presumably a little higher. See also A. Rapoport's 
discussion of redundancy in this issue [Ed.}. 
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Shannon has estimated that if the English language had only about 30 per cent 
redundancy, then it would be possible to construct three-dimensional crossword 
puzzles. 

Before closing this section on information, it should be noted that the real 
reason that Level A analysis deals with a concept of information which char- 
acterizes the whole statistical nature of the information source, and is not con- 
cerned with the individual messages and not at all directly concerned with the 
meaning of the individual messages is that from the point of view of engineer- 
ing, a communication system must face the problem of handling any message 
that the source can produce. If it is not possible or practicable to design a 
system which can handle everything perfectly, then the system should be designed 
to handle well the jobs it is most likely to be asked to do, and should resign 
itself to be less efficient for the rare task. This sort of consideration leads at 
once to the necessity of characterizing the statistical nature of the whole ensemble 
of messages which a given kind of source can and will produce. And information 
as used in communication theory, does just this. 


Sega IT Is NOT at all the purpose of this paper to be concerned with 
mathematical details, it nevertheless seems essential to have as good an 
understanding as possible of the entropy-like expression which measures in- 
formation. If one is concerned, as in a simple case, with a set of » independent 
symbols, or a set of m independent complete messages for that matter, whose 


probabilities of choice are p,, p,, . . . Pn, then the actual expression for the 
information is 


H=— [P; log Py + Pe log Po - »o+ Pn log Pu) 


This looks a little complicated, but let us see how this expression behaves 
in some simple cases. 

Suppose first that we are choosing only between two possible messages, whose 
probabilities are then p, for the first and p, = 1— p, for the other. If one 
reckons, for this case, the numerical value of H, it turns out that H has its 
largest value namely one, when the two messages are equally probable; that 
is to say, when one is completely free to choose between the two messages. 
Just as soon as one message becomes more probable than the other (p, greater 
than p,, say), the value of H decreases. And when one message is very probable 
(p, almost one and p, almost zero, say), the value of H is very small (almost 
zero). 

In the limiting case where one probability is unity (certainty) and all the 
others zero (impossibility), then H is zero (no uncertainty at all—no freedom 
of choice—no information). 

Thus H is largest when the two probabilities are equal (i.e., when one is 
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completely free and unbiased in the choice), and reduces to zero when one’s 
freedom of choice is gone. 

The situation just described is in fact typical. If there are many rather 
than two, choices, then H is largest when the probabilities of the various choices 
are as nearly equal as circumstances permit—when one has as much freedom as 
possible in making a choice, being as little as possible driven toward some 
certain choices which have more than their share of probability. Suppose, on the 
other hand, that one choice has a probability near one so that all the other 
choices have probabilities near zero. This is clearly a situation in which one is 
heavily influenced toward one particular choice, and hence has little freedom of 
choice. And H in such a case does calculate to have a very small value—the 
information (the freedom of choice, the uncertainty) is low. 

When the number of cases is fixed, we have just seen that then the in- 
formation is the greater, the more nearly equal are the probabilities of the 
various cases. There is another important way of increasing H, namely by in- 
creasing the number of cases. More accurately, if all choices are equally likely, 
the more choices there are, the larger H will be. There is more “information” 
if you select freely out of a set of fifty standard messages, than if you select 
freely out of a set of twenty-five. 


Capacity of a Communication Channel 


_— THE DISCUSSION of the preceding section, one is not surprised that 
the capacity of a channel is to be described not in terms of the number 
of symbols it can transmit, but rather in terms of the /nformation it transmits. 
Or better, since this last phrase lends itself particularly well to a misinterpreta- 
tion of the word information, the capacity of a channel is to be described in 
terms of its ability to transmit what is produced out of a source of a given 
information. 


If the source is of a simple sort in which all symbols are of the same time 
duration (which is the case, for example, with teletype), if the source is such 
that each symbol chosen represents s bits of information, being freely chosen 
from among 2* symbols, and if the channel can transmit, say » symbols per 
second, then the capacity C of the channel is defined to be ns bits per second. 

In a more general case, one has to take account of the varying lengths of 
the various symbols. Thus the general expression for capacity of a channel 
involves the logarithm of the numbers of symbols of certain time duration 
(which introduces, of course, the idea of information and corresponds to the 
factor s in the simple case of the preceding paragraph) ; and also involves the 
number of such symbols handled (which corresponds to the factor m of the 
preceding paragraph). Thus in the general case, capacity measures not the 
number of symbols transmitted per second, but rather the amount of informa- 
tion transmitted per second using bits per second as its unit. 
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Coding 


A’ THE OUTSET it was pointed out that the transmitter accepts the message 
and turns it into something called the signal, the latter being what actually 
passes over the channel to the receiver. 

The transmitter, in such a case as telephony, merely changes the audible voice 
signal over into something (the varying electrical current on the telephone wire) 
which is at once clearly different but clearly equivalent. But the transmitter may 
carry out a much more complex operation on the message to produce the signal. 
It could, for example, take a written message and use some code to encipher 
this message into, say a sequence of numbers, these numbers being sent over 
the channel as the signal. 

Thus one says, in general, that the function of the transmitter is to encode, and 
that of the receiver is to decode, the message. The theory provides for very sophis- 
ticated transmitters and receivers—such, for example, as possess “memories,” 
so that the way they encode a certain symbol of the message depends not only 
upon this one symbol, but also upon previous symbols of the message and the 
way they have been encoded. 

We are now in a position to state the fundamental theorem, produced in 
this theory, for a noiseless channel transmitting discrete symbols. This theorem 
relates to a communication channel which has a capacity of C bits per second, 
accepting signals from a source of entropy (or information) of H bits per 
second. The theorem states that by devising proper coding procedures for the 
transmitter it is possible to transmit symbols over the channel at an average 
rate * which is nearly C/H but which, no matter how clever the coding, can 
never exceed C/H. 

The significance of this theorem is to be discussed more usefully a little later, 
when we have the more general case when noise is present. For the moment, 
though, it is important to notice the critical role which coding plays. 

Remember that the entropy (or information) associated with the process 
which generates messages or signals is determined by the statistical character of 
the process—by the various probabilities for arriving at message situations and 
for choosing, when in those situations, the next symbols. The statistical nature 
of messages is entirely determined by the character of the source. But the sta- 
tistical character of the signal as actually transmitted by a channel, and hence 
the entropy in the channel, is determined both by what one attempts to feed 
into the channel and by the capabilities of the channel to handle different signal 
situations. For example, in telegraphy, there have to be spaces between dots 
and dots, between dots and dashes, and between dashes and dashes, or the dots 
and dashes would not be recognizable. 


*We remember that the capacity C involves the idea of information transmitted per 
second, and is thus measured in bits per second. The entropy H here measures information 
per symbol, so that the ratio of C to H measures symbols per second. 
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Now it turns out that when a channel does have certain constraints of this 
sort, which limit complete signal freedom, there are certain statistical signal 
characteristics which lead to a signal entropy which is larger than it would be 
for any other statistical signal structure, and in this important case, the signal 
entropy is exactly equal to the channel capacity. 

In terms of these ideas, it is now possible precisely to characterize the most 
efficient kind of coding. The best transmitter, in fact, is that which codes the 
message in such a way that the signal has just those optimum statistical char- 
acteristics which are best suited to the channel to be used—which in fact 
maximize the signal (or one may say, the channel) entropy and make it equal 
to the capacity C of the channel. 

This kind of coding leads, by the fundamental theorem above, to the maxi- 
mum rate C/H for the transmission of symbols. But for this gain in trans- 
mission rate, one pays a price. For rather perversely it happens that as one makes 
the coding more and more nearly ideal, one is forced to longer and longer 
delays in the process of coding. Part of this dilemma is met by the fact that in 
electronic equipment ‘long’ may mean an exceedingly small fraction of a 
second, and part by the fact that one makes a compromise, balancing the gain 
in transmission rate against loss of coding time. 


Noise 


ip DOES noise affect information? Information is, we must steadily re- 


member, a measure of o1 reedom of choice in selecting a message. The 
greater this freedom of choice, and hence the greater the information, the 
greater is the uncertainty that the message actually selected is some particular 
one. Thus greater freedom of choice, greater uncertainty, greater information 
go hand in hand. 

If noise is introduced, then the received message contains certain distortions, 
certain errors, certain extraneous material, that would certainly lead one to say 
that the received message exhibits, because of the effects of the noise, an in- 
creased uncertainty. But if the uncertainty is increased, the information is 
increased, and this sounds as though the hoise were beneficial! 

It is generally true that when there is noise, the received signal exhibits 
greater information—or better, the received signal is selected out of a more 
varied set than is the transmitted signal. This is a situation which beautifully 
illustrates the semantic trap into which one can fall if he does not remember 
that “information” is used here with a special meaning that measures freedom 
of choice and hence uncertainty as to what choice has been made. It is there- 
fore possible for the word information to have either good or bad connotations. 
Uncertainty which arises by virtue of freedom of choice on the part of the 
sender is desirable uncertainty. Uncertainty which arises because of errors or 
because of the influence of noise, is undesirable uncertainty. 
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It is thus clear where the joker is in saying that the received signal has 
more information. Some of this information is spurious and undesirable and 
has been introduced via the noise. To get the useful information in the received 
signal we must subtract out this spurious portion. 


Before we can clear up this point we have to stop for a little detour. Sup- 
pose one has two sets of symbols, such as the message symbols generated by the 
information source, and the signal symbols which we actually received. The 
probabilities of these two sets of symbols are interrelated, for clearly the proba- 
bility of receiving a certain symbol depends upon what symbol was sent. With 
no errors from noise or from other causes, the received signals would correspond 
precisely to the message symbols sent; and in the presence of possible error, 
the probabilities for received symbols would obviously be loaded heavily on 
those which correspond, or closely correspond, to the message symbols sent. 


Now in such a situation one can calculate what is called the entropy of one 
set of symbols relative to the other. Let us, for example consider the entropy of 
the message relative to the signal. It is unfortunate that we cannot understand 
the issues involved here without going into some detail. Suppose for the moment 
that one knows that a certain signal symbol has actually been received. Then each 
message symbol takes on a certain probability—relatively large for the symbol 
identical with or the symbols similar to the one received, and relatively small 
for all others. Using this set of probabilities, one calculates a tentative entropy 
value. This is the message entropy on the assumption of a definite known 
received or signal symbol. Under any good conditions its value is low; since the 
probabilities involved are not spread around rather evenly on the various cases, 
but are heavily loaded on one or a few cases. Its value would be zero in any 
case where noise was completely absent, for then, the signal symbol being 
known, all message probabilities would be zero except for one symbol (namely 
the one received), which would have a probability of unity. 

For each assumption as to the signal symbol received, one can calculate one 
of these tentative message entropies. Calculate all of them, and then average 
them, weighting each one in accordance with the probability of the signal symbol 
assumed in calculating it. Entropies calculated in this way, when there are two 
sets of symbols to consider, are called relative entropies. The particular one just 
described is the entropy of the message relative to the signal, and Shannon has 
named this also the equivocation. 


From the way this equivocation is calculated, we can see what its significance 
is. It measures the average uncertainty in the message when the signal is known. 
If there were no noise, then there would be no uncertainty concerning the 
message if the signal is known. If the information source has any residual un- 


certainty after the signal is known; then this must be undesirable uncertainty 
due to noise. 
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fp DISCUSSION of the last few paragraphs centers around the quantity “the 

average uncertainty in the message source when the received signal is known.” 

It can equally well be phrased in terms of the similar quantity “the average 

uncertainty concerning the received signal when the message sent is known.” 

This latter uncertainty would, of course, also be zero if there were no noise. 
As to the interrelationship of these quantities, it is easy to prove that 


H(x) — H,(x) = H(y) — H,(y) 


where H (x) is the entropy or information of the source of messages; H(y) the 
entropy or information of received signals; H,(x) the equivocation, or the un- 
certainty in the message source if the signal be known; H,(y) the uncertainty 
in the received signals if the messages sent be known or the spurious part of 
the received signal information which is due to noise. The right side of this 
equation is the useful information which is transmitted in spite of the bad 
effect of the noise. 

It is now possible to explain what one means by the capacity C of a noisy 
channel. It is, in fact, defined to be equal to the maximum rate (in bits per 
second) at which useful information (i.¢., total uncertainty minus noise un- 
certainty) can be transmitted over the channel. 

Why does one speak, here, of a ‘maximum’ rate? What can one do, that 
is, to make this rate larger or smaller? The answer is that one can affect this 
rate by choosing a source whose statistical characteristics are suitably related to 
the restraints imposed by the nature of the channel. That is, one can maximize 
the rate of transmitting useful information by using proper coding. 


a Now, finally, let us consider the fundamental theorem for a noisy chan- 
nel. Suppose that this noisy channel has, in the sense just described, a 
capacity C, suppose it is accepting from an information source characterized by 
an entropy of H(x) bits per second, the entropy of the received signals being 
H(y) bits per second. If the channel capacity C is equal to or larger than 
H(x), then by devising appropriate coding systems, the output of the source 
can be transmitted over the channel with as little error as one pleases. However 
small a frequency of error you specify, there is a code which meets the demand. 
But if the channel capacity C is less than H(x), the entropy of the source from 
which it accepts messages; then it is impossible to devise codes which reduce 
the error frequency as low as one may please. 

However clever one is with the coding process, it will always be true that 
after the signal is received there remains some undesirable (noise) uncertainty 
about what the message was; and this undesirable uncertainty 


this equivoca- 
tion—will always be equal to or greater than H(x) - 


C. Furthermore, there is 
always at least one code which is capable of reducing this undesirable unc ertainty, 
concerning the message, down to a value which exceeds H(x) — C by an arbi- 
trarily small amount 
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The most important aspect, of course, is that the minimum undesirable or 
spurious uncertainties cannot be reduced further, no matter how complicated or 
appropriate the coding process. This powerful theorem gives a precise and 
almost startlingly simple description of the utmost dependability one can ever 
obtain from a communication channel which operates in the presence of noise. 

One practical consequence, pointed out by Shannon, should be noted. Since 
English is about 50 per cent redundant, it would be possible to save about one- 
half the time of ordinary telegraphy by a proper encoding process, provided one 
were going to transmit over a noiseless channel. When there is noise in a 
channel, however, there is some real advantage in not using a coding process that 
eliminates all the redundancy. For the remaining redundancy helps combat the 
noise. This is very easy to see, for just because of the fact that the redundancy 
of English is high, one has, for example, little or no hesitation about correcting 
errors in spelling that have arisen during transmission. 


Continuous Messages 


P TO THIS POINT we have been concerned with messages formed out of 
discrete symbols, as words are formed of letters, sentences of words, a 
melody of notes, or a halftone picture of a finite number of discrete spots. What 
happens to the theory if one considers continuous messages, such as the speaking 
voice with its continuous variation of pitch and energy? 
Very roughly one may say that the extended theory is somewhat more difh- 


cult and complicated mathematically, but not essentially different. Many of the 


above statements for the discrete case require no modification, and others require 
only minor change. 


One circumstance which helps a good deal is the following. As a practical 
matter, one is always interested in a continuous signal which is built up of 
simple harmonic constituents of not all frequencies, but rather of frequencies 
which lie wholly within a band from zero frequency to, say, a frequency of 
W cycles per second. Thus although the human voice does contain higher 
frequencies, very satisfactory communication can be achieved over a telephone 
channel that handles frequencies only up to, say, four thousand. With fre- 
quencies up to ten or twelve thousand, high fidelity radio transmission of sym- 
phonic music is possible, etc. 

There is a very convenient mathematical theorem which states that a con- 
tinuous signal, T seconds in duration and band-limited in frequency to the range 
from O to W, can be completely specified by stating 2TW numbers. This is 
really a remarkable theorem. Ordinarily a continuous curve can be only approxi- 
mately characterized by stating any finite number of points through which it 
passes, and an infinite number would in general be required for complete 
information about the curve. But if the curve is built up out of simple harmonic 
constituents of a limited number of frequencies, as a complex sound is built 
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up out of a limited number of pure tones, then a finite number of parameters 
is all that is necessary. This has the powerful advantage of reducing the char- 
acter of the communication problem for continuous signals from a complicated 
situation where one would have to deal with an infinite number of variables 
to a considerably simpler situation where one deals with a finite (though large) 
number of variables. 

In the theory for the continuous case there are developed formulas which 
describe the maximum capacity C of a channel of frequency band-width W, 
when the average power used in transmitting is P, the channel being subject to 
a noise of power N, this noise being ‘white thermal noise” of a special kind 
which Shannon defines. This white thermal noise is itself band-limited in fre- 
quency, and the amplitudes of the various frequency constituents are subject to 
a normal (Gaussian) probability distribution. Under these circumstances, 
Shannon obtains the theorem, again really quite remarkable in its simplicity 
and its scope, that it is possible, by the best coding, to transmit binary digits 
at the rate of 


” Pi.N 
W’ log, - — 

bits per second and have an arbitrarily low frequency of error. But this rate 
cannot possibly be exceeded, no matter how clever the coding, without giving 
rise to a definite frequency of errors. For the case of arbitrary noise, rather than 
the special “white thermal’’ noise assumed above, Shannon does not succeed in 
deriving one explicit formula for channel capacity, but does obtain useful upper 
and lower limits for channel capacity. And he also derives limits for channel 
capacity when one specifies not the average power of the transmitter, but rather 
the peak instantaneous power. 

Finally it should be stated that Shannon obtains results which are necessarily 
somewhat less specific, but which are of obviously deep and steeping significance, 
which, for a general sort of continuous message or signal, characterize the fidelity ° 
of the received message, and the concepts of rate at which a source generates 
information, rate of transmission, and channel capacity, all of these being rela- 
tive to certain fidelity requirements. 


THE INTERRELATIONSHIP OF THE THREE LEVELS 
OF COMMUNICATION PROBLEMS 


9 THE FIRST section of this paper it was suggested that there are three levels 
at which one may consider the general communication problem. Namely, 
one may ask: 


Level A. How accurately can the symbols of communication be transmitted? 


Level B. How precisely do the transmitted symbols convey the desired 
meaning? 
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Level C. How effectively does the received meaning affect conduct in the 
desired way? 


It was suggested that the mathematical theory of communication, as developed 
by Shannon, Wiener, and others, and particularly the more definitely engineering 
theory treated by Shannon, although ostensibly applicable only to Level A 
problems, actually is helpful and suggestive for the Level B and C problems. 

We then took a look, in the next section, at what this mathematical theory 
is, what concepts it develops, what results it has obtained. It is the purpose of 
this concluding section to review the situation, and see to what extent and in 
what terms the original section was justified in indicating that the progress made 
at Level A is capable of contributing to Levels B and C, was justified in indi- 
cating that.the interrelation of the three levels is so considerable that one’s final 
conclusion may be that the separation into the three levels is really artificial and 
undesirable. 


Generality of the Theory at Level A 


HE OBVIOUS first remark, and indeed the remark that carries the major bur- 

den of the argument, is that the mathematical theory is exceedingly general 
in its scope, fundamental in the problems it treats, and of classic simplicity and 
power in the results it reaches. 


This is a theory so general that one does not need to say what ‘kinds of 
symbols are being considered—whether written letters or words, or musical 
notes, or spoken words, or symphonic music, or pictures. The theory is deep 
enough so that the relationships it reveals indiscriminately apply to all these and 
to other forms of communication. This means, of course, that the theory is 
sufficiently imaginatively motivated so that it is dealing with the real inner core 
of the communication problem—with those basic relationships which hold in 
general, no matter what special form the actual case may take. 


It is an evidence of this generality that the theory contributes importantly 
to, and in fact is really the basic theory of cryptography which is, of course, a 
form of coding. In a similar way, the theory contributes to the problem of 
translation from one language to another, although the complete story here 
clearly requires consideration of meaning, as well as of information. Similarly, 
the ideas developed in this work connect so closely with the problem of the 
logical design of great computers that it is no surprise that Shannon has just 
written a paper on the design of a computer which would be capable of playing 
a skillful game of chess. And it is of further direct pertinence to the present 
contention that this paper closes with the remark that either one must say that 
such a computer “thinks,” or one must substantially modify the conventional 
implication of the verb “to think.” 
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AS A SECOND point, it seems clear that an important contribution has been 
made to any possible general theory of communication by the formalization 
on which the present theory is based. It seems at first obvious to diagram a 
communication system as it is done at the outset of their theory; but this break- 
down of the situation must be very deeply sensible and appropriate, as one 
becomes convinced when he sees how smoothly and generally this viewpoint 
leads to the central issues. It is almost certainly true that a consideration of 
communication on levels B and C will require additions to the schematic diagram 
above, but it seems equally likely that what is required are minor additions, and 
no real revision. 

Thus when one moves to levels B and C, it may prove to be essential to take 
account of the statistical characteristics of the destination. One can imagine, as 
an addition to the diagram, another box labeled “Semantic Receiver’ interposed 
between the engineering receiver (which changes signals to messages) and the 
destination. This semantic receiver subjects the message to a second decoding, 
the demand on this one being that it must match the statistical semantic charac- 
teristics of the message to the statistical semantic capacities of the totality of 
receivers, or of that subset of receivers which constitute the audience one wishes 
to affect. 


Similarly one can imagine another box in the diagram which, inserted be- 
tween the information source and the transmitter, would be labeled ‘semantic 
noise,” the box previously labeled as simply ‘noise’ now being labeled “engineer- 
ing noise.’ From this source are imposed into the signal the perturbations or 
distortions of meaning which are not intended by the source but which in- 
escapably affect the destination. And the problem of semantic decoding must 
take this semantic noise into account. It is also possible to think of an adjust- 
ment of original message so that the sum of message meaning plus semantic 
noise is equal to the desired total message meaning at the destination. 


_ it seems highly suggestive for the problem at all levels that error 
and confusion arise and fidelity decreases, when, no matter how good the 
coding, one tries to crowd too much over a channel (i.e., H > C). Here again 
a general theory at all levels will surely have to take into account not only the 
capacity of the channel but also (even the words are right!) the capacity of 


the audience. If one tries to overcrowd the capacity of the audience it is prob- 
ably true, by direct analogy, that you do not, so to speak, fill the audience up 
and then waste only the remainder by spilling. More likely, and again by direct 
analogy, if you overcrowd the capacity of the audience you force a general and 
inescapable error and confusion. 


Fourthly, it is hard to believe that levels B and C do not have much to learn 
from, and do not have the approach to their problems usefully oriented by, the 
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development in this theory of the entropic ideas in relation to the concept of 
information. 

The concept of information developed in this theory at first seems disappoint- 
ing and bizarre—disappointing because it has nothing to do with meaning, and 
bizarre because it deals not with a single message but rather with the statistical 
character of a whole ensemble of messages, bizarre also because in these sta- 
tistical terms the two words information and uncertainty find themselves to be 
partners. 

I think, however, that these should be only temporary reactions; and that 
one should say, at the end, that this analysis has so penetratingly cleared the 
air that one is now, perhaps for the first time, ready for a real theory of meaning. 
An engineering communication theory is just like a very proper and discreet 
girl accepting your telegram. She pays no attention to the meaning, whether 
it be sad, or joyous, or embarassing. But she must be prepared to deal with all 
that come to her desk. This idea that a communication system ought to try to deal 
with all possible messages, and that the intelligent way to try is to base design 
on the statistical character of the source, is surely not without significance for 
communication in general. Language must be designed (or developed) with a 
view to the totality of things that man may wish to say; but not being able to 
accomplish everything, it too should do as well as possible as often as possible. 
That is t» say, it too should deal with its task statistically. 


4 hese cOwCEPT of the information to be associated with a source leads directly, 
as we have seen, to a study of the statistical structure of language; and this 
study reveals about the English language, as an example, information which 
seems surely significant to students of every phase of language and communica- 
tion. The idea of utilizing the powerful body of theory concerning Markoff 
processes seems particularly promising for semantic studies, since this theory is 
specifically adapted to handle one of the most significant but difficult aspects of 
meaning, namely the influence of context. One has the vague feeling that in- 
formation and meaning may prove to be something like a pair of canonically 
conjugate variables in quantum theory, they being subject to some joint restric- 
tion that condemns a person to sacrifice of the one as he insists on having much 
of the other. 

Or perhaps meaning may be shown to be analogous to one of the quantities 
on which the entropy of a thermodynamic ensemble depends. The appearance of 
entropy in the theory, as was remarked earlier, is surely most interesting and 
significant. Eddington has already been quoted in this connection, but there is 
another passage in The Nature of the Physical World which seems particularly 
apt and suggestive: 

Suppose that we were asked to arftange the following in two cate- 
gories—distance, mass, electric force, entropy, beauty, melody. 
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I think there are the strongest grounds for placing entropy alongside 
beauty and melody, and not with the first three. Entropy is only found 
when the parts are viewed in association, and it is by viewing or hearing 
the parts in association that beauty and melody are discerned. All three 
are features of arrangement. It is a pregnant thought that one of these 
three associates should be able to figure as a commonplace quantity of 
science. The reason why this stranger can pass itself off among the abo- 
rigines of the physical world is that it is able to speak their language, viz., 
the language of arithmetic. 


I feel sure that Eddington would have been willing to include the word 
meaning along with beauty and melody; and I suspect he would have been 
thrilled to see, in this theory, that entropy not only speaks the language of 
arithmetic; it also speaks the language of language. 





When we ask a child to name something, we are teaching him to 
make a response that insures communication. We are also passing on 
to him our own ways of observing. We have various means of re- 
warding him when he is right, punishing him when wrong. We can 
do it by feeding or beating him, but the first can only be infrequent 
and the second tends to cut off all connection with him. We do it 
much more subtly by establishing first a special behavior sequence, 
that of communication. The child's most important lesson is that 
the fitting of stimuli into a communicable form produces “‘satis- 


factory” results. It is difficult to appreciate how deeply this first way 
of responding controls all the others, which are later learned through 


it. Once this is established it is not necessary to set up an elaborate 
apparatus of rewards and punishments to teach each new association. 
By giving the signs of approval or disapproval we can show the child 
instantly whether he has produced the right reactions or not. His 
whole brain system is trained so that it seeks to organize all the 
sensory input into some communicable output—to put it into words. 


— J. Z. YouNG, Doubt and Certainty in Science. 





Circularity of Knowledge* 


R. W. GERARD** 


The input comes chancily out 

The tail end hooks up to the snout 
The bits enter in 

Circle round and enfin 

Emerge and put chaos to rout 


Behavior is caused but still free 
It depends on a good memory. 
Mechanical mice 

Learn a maze in a trice 

And so act intelligently 


Computers use scales or just digits 
To give neat results or the fidgets. 
They may grind out ideas 

But not hopes and fears 

And are feeling, if not mental, idjits 


Cybernetics grew out of our group 

Macy welcomes nerve sparks? and nerve soup ® 
Warren,* Frank ® and the rest 

Proved to be of the best 

So “good,” “cause,” and “brain’’ loop the loop 


Allee oop 


* Written for the occasion of the last meeting of the Macy Conference on Cybernetics. 

** Director of Research, Neuropsychiatric Institute, Division of Psychiatry, University 
of Illinois. 

* Josiah Macy Jr. Foundation. 

* The electrical mechanism of synaptic transmission. 

* The neurohumoral mechanism of synaptic transmission. 

“Warren S. McCulloch, chairman of the Cybernetics Conference group. 

* Frank Fremont-Smith, Medical Director, Macy Foundation. 
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FUNDAMENTAL NATURAL 
CONCEPTS OF INFORMATION 
THEORY 


EDWARD W. SAMSON * 


{Epiror’s FOREWORD. Elsewhere in this issue the theory of information is 
approached by way of intuitive concepts. One starts with what one thinks one 
knows about the meaning of “information,” for example as “that which causes 
a change in the amount of knowledge we have’ and proceeds from these common 
sense notions to formal mathematical definitions. In the present article the process 
is reversed. Dr. Samson begins with the logical foundations of “knowledge based 
on experience,” on which a formal superstructure of logical reasoning has already 
been erected (in this case the generalized two-valued logic, the so-called Boolian 
Algebra). Then he poses the desired formal properties of a concept which could 
conveniently represent the amount of new knowledge acquired in any experi- 
mental or observational situation. Finally he shows that these assumed properties 
of the concept lead to the definition of the amount of information’ as it appears 
in information theory. As a corollary of the analysis, the negative logarithm of 
the probability of an event turns out to be the measure of “surprise” that we 
experience whenever the event occurs. The quantitative relation is an obvious 
one: the smaller the probability, the larger the “surprise”; the surprise is zero 
if we knew the outcome in advance and infinite if we thought the outcome all 
but impossible. 

The mathematically untrained reader may find Dr. Samson's approach un- 
usual, perhaps hard to follow, since the usual thing in any exposition is to start 
with common sense notions and then refine or generalize them to formal ones. 
For example, in developing the theory of trigonometric functions in elementary 
mathematics, one begins with rather concrete notions of line segment ratios and 
then generalizes them to the familiar wave-like functions (sine, cosine, etc.) 
defined for arbitrary values of the independent variable. However in advanced 
mathematics, one often starts from the other end. One asks what are the functions 
which satisfy certain differential equations. As an answer to this question, one 
obtains the well-known trigonometric functions. This is also Dr. Samson's 


method (as is the method developed in formal information theory). He asks 


* Chief of the Communications Laboratory, Electronics Research Directorate, Air Force 
Cambridge Research Center. This paper was published as a report of the A.F.C.R.C. in 
October 1951 and is reprinted by permission 
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what functions satisfy certain properties, which he poses in terms of the logical 
constructs concomitant to a formal empirical theory of knowledge. In answer he 
gets the well-known definition of “the amount of information.” Along the way, 
however, he also obtains “natural” mathematical definitions of the intuitively 
familiar notions of “probability” and “surprise.” 

It may be noted that the aims of Dr. Samson's approach are somewhat similar 
to those of R. Carnap in the latter's extensive analysis of the notions involved 
in probability (Cf. R. Carnap, The Logical Foundations of Probability, Chicago: 
The University of Chicago Press, 1950). The orientation in both approaches 
seems to stem from a feeling that the “logical foundations” ought to be rooted 
in intuitive psychological notions (as, for example, the concept of “confirmation” 
in Carnap’s treatment of probability) and not merely in a formal axiomatization. 
This dissatisfaction with formalism (i.e., the exclusive use of deductive processes) 
seemed also to motivate the mathematicians of the “intuitionist’ school and Kor- 
zybski in his attempt to re-define the fundamental concept of “number.” 

This critical attitude can be traced back to Kronecker's famous remark, “God 
made the integers. Everything else is the work of man!” This sentiment is echoed 
in Dr. Samson’s postulate that the only abstract quality of every objective event 
is distinguishability . .. {which} admits only of counting and class distinction 
of events as a source of quantitative universal properties .. .” This insistence on 
constantly bearing in mind connections between logic and psychology (Korzyb- 
ski's emphasis on the “psycho-logic”) certainly reflects at least one aspect of a 
non-aristotelian orientation. 

The present paper (published here in somewhat abridged form) was written 
as a report of the Air Force Cambridge Research Center. Dr. Samson wishes to 
express acknowledgment to Dr. N. G. Parke for his assistance in the purely 
mathematical phases of the paper and for constructive criticisms. —A. R.]} 


igo work of C. E. Shannon? has introduced into the fields of probability 
and statistical theory, as applied to the signals and symbols used in com- 
munication, certain, powerful methods of analysis based on the formula 


H=— =P log Pi, 

where p, is a probability associated with certain situations of interest, and H 
is a measure function associated with such various notions as information, choice, 
uncertainty, entropy, knowledge, and capacity for communication. The great 
value of the methods has been amply demonstrated by the scope of the applica- 
tions made. 

Shannon’s development leaves the present writer with the sense that there 

1C, E. Shannon and W. Weaver, The Mathematical Theory of Communication. 


Urbana: University of Illinois Press, 1949. Cf. especially Section b on “Choice, Uncertainty, 
and Entropy.” 
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are scientific roots underlying this structure which are not yet fully visible, and 
that it would be valuable to explore the concepts further in fundamental terms. 
We wish to attain fundamental scientific insight into the theory of informa- 
tion. This insight is not merely a matter of noting coincidences or developing 
skill. It is not entirely in the domain of pure mathematics, for the meanings of 
the mathematical symbols are in their referents (the symbols are not self- 
contained). Nor is it entirely in the domain of physics (although observations 
are certainly involved), for introspective observations as to the relations of the 
“mind” to events will always be present and important. These latter processes 
are akin to those involved in the formulation of the laws of pure logic, where 
observations of the mental state of conviction, sense of implication, etc. are 
required. Such observations fall most properly into the field of semantics. 


The General Problem 


yen WORLD is full of objective events, and of people producing mental events, 
or ideas with complex relations to each other and to the objective events 
observed. People communicate to each other ideas or reports of observations by 
means of signals and symbols. Communications may be redundant, partly irrele- 
vant or partly obscured by noise. Permeating all these processes, there exist some 
universally relevant quantitative conceptual phenomena. Our object is to analyze 
precisely what is and can be involved. It is not only desirable to discover what 
exists and is measurable. It is also particularly desirable to eliminate potential 
illusions by clarifying what does not have such universally measurable prop- 
erties, and to show any major distinctions among things that can be involved. 

For simplicity, we shall assime that our theory involves only objective ob- 
servable events in the material world as primary material. We must, however, 
deal also with the processing of this material by the ‘‘mind,” in generating cer- 
tain senses or concepts relevant to fulfilling the natural purposes of the “mind.” 
Signals, symbols or languages of communication all fall within the scope of the 
theory as manifested objective events. Any theory of values, however, is ruled 
out, because values (in the sense of human good-bad values) are derived from 
subjective feeling events and so are essentially of a different logical type. Any 
scientific theory of values, however, if possible, would very likely have to begin 
with a rather similar development, extended to include value attributes of events. 
Such value attributes would doubtless be functions of people. 

We assume further that in the beginning the mind is a sort of blank slate 
on which a record of observed events accumulates as they are observed, in the 


form of memory. Intellectual activity concerns relations among experiences, 
expectations, purposes, and conduct. An event has intellectual significance or 
meaning to an observer only insofar as it relates to his experience. To consider 
events in their proper relation to an observer, it is essential to specify the 
observer's experience. 
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The mind seems to operate consciously on objective experience by noting it 
in usually small lots that are fairly clearly bounded and lead to communication 
and to thought in terms of declarative sentences known in logic as propositions, 


each of which represents some event. The small lots may be aggregated into 
larger lots as complex events. They may also be aggregated into sets that char- 
acterize some object or set of objects which constitute an objective system. The 
latter is something that when observed produces events, each a distinct one of 
some specified set of event types known to experience. An event type is a class of 
events similar in all respects noted, except the occasion of observation. An occa- 
sion of complete observation of an objective system will be called an experiment. 
The total result will also be called an experiment. An objective system may 
comprise part or all of the observer's actual world of experience but we wish 
to view such a system as comprising his world. As an elementary example, we 
can view the tossing of a coin as an objective system, the outcome of a toss an 
event type, the observation of the outcome an experiment. Events and event 
types ? correspond to propositions and propositional functions in logic, where 
the things dealt with are experiments: a proposition declares an event and a 
propositional function describes a class of experiments, or event type. 


Class Diagram of Experience 
A CLASS DIAGRAM (Fig. 1) provides a good visual representation of the 
observer's experience, that is, of his collective of N experiments, where the 
algebra of classes applies. An experiment is represented by a point, a state x, 


*H. Reichenbach, The Theory of Probability. Berkeley and Los Angeles: University of 
California Press, 1949. Chapter 2, “Introduction to Symbolic Logic.” 
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by a single exclusive area. Any set of state areas may be regarded as an event 
type, though actually only a few of the possibilities would be likely selections 
for objective recognition as an event type. The following notations will be used: 
xX, + X,; means x, and x;, that,is the occurrence of both x, and x;. 
x, VX; Means x, Or x;, that is the occurrence of either x, or x, or both. 
n — the number of elements in X (the total universe of discourse). 
« means ‘belongs to.” 
x means the domain of all event types x, « X, that is x, v x, Vv x, V 
nm 


n* 


Any experimental event type in the system, for example, £; may be expressed 
as some function of states of X connected by an “‘or.’’ Thus we may define 
oy 
Conversely the event types_x may be expressed as functions of the é's or of their 
complements ¢', where the latter denotes the nonoccurrence of €. Thus in 


é. 
Figure 1, three observations may determine a state: 


Notice also that 


X = §, 
and x, represents a state in which none of the event types appears. Unless we 
were observing a larger system, this would be the set of experiments in which 
nothing happened. Often such a state is excluded from the system. In general, 
the numbers of experiments vary among the states, and some may be empty. 
Emptiness and nonexistence are observationally equivalent. 


Basic Assumptions 


W. ARE NOW in position to state our basic assumptions. 

Assumption I: All quantitative conceptual phenomena that involve only ob- 
jective events and are universally relevant to objective events will be derivable 
by an observer confined to any typical objective system defined by his experience. 

Assumption II: Any complete experiment of an objective system belongs 
to one of a set X of mutually exclusive states x, which defines the system. 


Assumption III; The observer's experience of the system X is completely 
comprised in his knowing: (1) the states comprising X and (2) the results of 
a set of N independent experiments (called a collective). 

The N experiments must be regarded as independent elements, or at least 
independent memories of the observer's experience. If he remembered correla- 
tions, this would automatically imply that he was taking note of some J/arger 
system within which lies the system he is supposed to be dealing with to the ex- 
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clusion of all other things. The procedure corresponds to the physical concept of 
isolating a system for study. 

Besides these assumptions there are certain desiderata which apply to the 
quantitative properties in terms of which objective systems and the events and 
symbolic descriptions associated with them may be discussed. Our problem, of 
course, is the discovery of such quantitative properties. They must be 


(1) Quantitative, i.e., have a measure which leads to a number. 

(2) Universal, i.e., such a number must be derivable for any objective system. 

(3) Characteristic, i.e., the number must be attributable to and depend on 
the particular parts of the objective system in question. It must remain fixed 
when the system is fixed. 

(4) Observational, i.e., it must measure a behavior characteristic, not a 


merely formal one, and so depend not merely on defining, but on observing the 
system. 


Potential Data Limited to Counting Procedures 
A’ EVENT has a certain purely objective quality which, in fact, is the event 
and which sometimes involves objective quantities such as volume, weight, 
etc. But these are not universal. No universally-present quantitative attribute can 
be found in the event itself. We must therefore look to the abstract qualities of 
every objective event for our quantitative phenomena. 
The only abstract quality of every objective event is distinguishability. Dis- 


tinguishability admits only of counting and class distinction of events as a source 
of quantitative universal properties of events or system of events. It therefore 
admits only the study of the kind of abstract material exhibited in our class 
diagrams. 

Only the counting process is quantitative, but the counting may be organized 
in relation to classes of experiments in various ways according to the quantitative 
phenomenon being evaluated. We may proceed to examine conceivable counting 
processes and further apply our criteria. 

Whatever quantitative function we may be interested in must be constructed 
of elemental functions, each of which depends on a count of something that 
must be in evidence in the diagram of the collective of experience. There are 
only two kinds of things to count, experiments within some class (experiences 
of some event type) or subclasses within some class (ways of classifying). 

Consider subclasses within a class of event types. Subclasses that are not 
states are determined by the states, which are the only definite classes that repre- 
sent objectively observable event types in all cases. States may be defined by 
some selected set of subclasses that are not states, but there is no universal 
process for selecting such a set, or any other set, to be counted. Since a defini- 
tion of a system by a set of » actually occurring event types formally implies 2° 
conceivable states, many of which never actually occur, it is apparent that the 
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classes in general represent formal characteristics. The observational behavior 
characteristics of classes or subclasses so defined are inherently the actual experi- 


ments contained, and not the classes. Counting of classes or subclasses as such 
must be ruled out as potential data of the kind we seek. 

It is concluded that all properties of a system which are quantitative, uni- 
versal and observational characteristics must arise as functions of the counts of 
experiments in classes of the collective experience. 


The Situations of Primary Interest 


H’"° LOOKED into the possible sources of the right kind of properties, 
we must now inquire into the basic types of interest that we may have in 
an objective system to discover what correspondences there may be between the 
interests and the properties. 

We have remarked before that intellectual activity concerns relations between 
experiences, expectations, purposes, and conduct. Purposes and conduct involve 
human values which we have ruled out of our subject matter, so the properties 
we seek will not be in the fields of purposes or conduct, but in the fields of 
experiences and predictions. Purposes, however, do /ead to interests concerning 
experiences and the expectations that result from experience. The most im- 
portant interests are relevant to life in general, and so may be expected to corre- 
spond to automatic intuitive concepts that appear in the mind in relation to ex- 
periences, expectations, and fulfillments of expectations, often without benefit 
of conscious analytical thought. 

The three principal standpoints from which experiences are viewed by the 
mind are as past, as present, and as future. The past is in memory, the present 
is actual and is automatically related to the past, and the future is hypothetical 
and is automatically related by the mind to both past and present. 

Interest in the past for its own sake is a passive matter of feeling values only. 
The present is active, however, a process of transition from future to past. The 
future is the region of expectations. For a given system the present involves a 
situation in which part of an experiment has been observed and expectations 
are changing as events occur. 

In addition to past, present, and future viewpoints, there are different de- 
grees of detail in the interest that the mind may have in the system. Thus, we 
may wish to consider a class with no distinctions in its contents, or as a sub- 
system, with its subclass structure recognized. 

The principal quantitative properties of interest arise naturally from con- 
sidering the situations involving the least and the most detail. These may then 
be examined from the past, present, and future standpoints to see what in- 
tuitional concepts they give rise to. Then, the basic ideas having been estab- 
lished, further questions, as of intermediate degrees of detail, may be explored 
in a similar manner as desired. 
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The basic situation involving minimum detail is the actual event situation, 
in which each event of a sequence merely limits the scope of the still uncom- 
pleted part of the experiment, but implies nothing about the details of the still 
incomplete part. The situation of maximum detail corresponds to the general 
question of what might happen in a future experiment. 


The Actual Event Situation 


N THE LIGHT of the foregoing, we are now ready to seek the functions which 
form the basis of information theory. 

An event is always observed in the context of prior events. If event type r 
has occurred and event type p now occurs, we should like to establish measures 
of this transition with the correct general properties and perhaps some specific 
ones. Neither r nor p refers us to detailed content of the collective classes. The 
events when observed simply define an over-all class observed so far, details being 
as yet unobserved. We may then begin with some function f[m(r), n(r-p) } 
as including all counts that can be involved. Here m(r) represent the number 
of experiments of the r class, and m(r-p) the number of experiments of the 
r-p class (the intersection of r and p). Let us now define 


p(p/r) = n(r-p) /n(r). 


In other words, p(p/r) is a number associated with p and r. Then our function 
f can be written as f[m(r), p(e/r)n(r), or, since it is a function of » and fp, 


as ¢{n(r), p(P/r) }. 

But p(p/r) is a fixed observational characteristic of occurrence of the event 
p after r has occurred derived from counts in the collective, whereas n(r) varies, 
representing merely the number of experiments of the r class. Hence n(r), 
being explicit and the sole argument not fixed by system behavior, must vanish 
from the function if @ is to be a fixed characteristic. Therefore 


ain(r), p (e/r)} = gtP(e/r)] 
is a function of the single parameter p(p/r). 

Since a sequence of events may be expressed as one total event, and the 
intervening event types chosen for observation may differ but lead to the same 
total event type, it is clear that it is desirable that there be a combination law 


for event type transition measures, such that consistency is preserved. This law 
must be a binary operation () such that 


glp(r/x)] O alple/r)] = efp(r-e/x)]. 


Now addition and multiplication are the simplest mathematical operations. 
For the multiplication law, the type of possible functions is determinable as 
follows: First, note that r= x + r because x is the total class covering the sys- 
tem. Then the condition is 
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If we take partial derivatives with respect to »(r), we obtain 
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OF, replacing p values and rearranging, 


g{p(r/x) ] g{p(e/r) ] 


I(r /x ; = Pip/r - me 
POT) gf p(r/xy} = P&/) er o(e/r)) 


But p(r/x) and p(p/r) may vary inde pendently. Therefore each side of the 
equation above must equal the same constant. So in general, we shall have 


&'(p)/e(p) = &/p, or g(p) a p* (a, & constant). 
But the original condition shows that a = 1. Then 
&(P) Pp 


We thus see that in order for the multiplication law to be obeyed, the 
function g must be a power function. We naturally choose the simplest repre- 
sentative of this function type, with 


k 


We note in passing that we could smpose the condition & = 1 by demanding 
the satisfaction of the addition law for g(p) for the case of combining mutually 
exclusive classes. 

The function p(p/r) = n(r-p)/n(r) is then a simple function obeying 
the desired combination law under multiplication namely, 


pir x) p(p/ry 


i 


This (oddly enough!) is the well-known ‘‘probability’” function with its well- 
known combination law. 

Now having evolved in one of the simplest ways, a mathematical measure 
function, we may strongly suspect that there will be a correspondence of this 
function with a natural mental concept and a good English word. This is indeed 
the case. The concept is obviously expectation. We have been using it right 
along. Mathematical probability is the mathematical counterpart of the natural 
intuitive concept of expectation. It is important to notice that expectation is the 
natural quantitative attribute that the mind automatically associates with a future 
event. When an event happens in the present, expectation as such is forgotten. 
Let us, however, return to our event situation 


Let us now consider the additive combination law 
gl p(r/x)} + e{ple/r)} = el p(r-p/x) }. 
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But, as we have shown 
p(r-p/x) = p(r/x)p(P/r). 
Therefore, 
BL p(r/x)] + gfp(e/r)] = glp(r/x) P (e/r)}. 
The two arguments are independent, whence, differentiating partially 


& [p(r/x)] = & [p(r/x) p (P/r)) P (e/r) 
& [pl(e/r)] = & {p(r/x) p (P/r)) p (r/x). 


Therefore, 
& {P(r/x) p(r/x) = g [p(e/r) }p(e/r) = A (constant). 
Hence, 
gl p(e/r)} = A log p(p/r) (A negative) 


is the general formula, the integration constant being zero. We are thus led to 
the conclusion that in order to obey the additive law, g must be a constant 
times the logarithm of p. We here impose the simple requirement that the 
function be positive. This is the unique function fulfilling the requirements. 
A is a mere scale factor and may be chosen as minus unity. Then 


& [p(P/r)} = — log p(r/r), 


which is the function sought. 


The Surprisal Property of Present Events 


a WE HAVE FOUND another simply-evolved mathematical measure func- 
tion, we may again suspect that it will correspond with a natural mental 


concept and a good English word. Again this is the case. The concept is 
surprise. But we have not been using it right along. It has not to the writer's 
knowledge been used in the theory of information, presumably because the 
significance of the — log p function as a measure of surprise has not been 
recognized. The concept of surprise has been considered by Weaver, however.® 


*W. Weaver, ‘Probability, Rarity, Interest, and Surprise,” The Scientific Monthly, 
Vol. 67 (December, 1948). Weaver makes a very interesting proposal for a surprise index, 
as (SI), = 1/p, = px*. His remarks cover the kind of considerations which might be used 
as grounds for objection to our identification of the surprise concept. Weaver correctly 
assumes that we will not be surprised unless we are interested, and rarity favors lack of 
interest. Dut interest is a human value phenomenon, a subjective event of the kind we have 
excluded from our analysis as of a different logical type. We have established a proper 
logical uniformity insofar as interest values are concerned. Further, the degree of detail 
of our system of interest is specified in principle. In a card game one does not in fact play 
in a system of real experience of six billion odd hands, but only of a small set of classes of 
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A sense of surprise arises in the present when an event happens, just as a sense 
of color might arise. The degree of surprise is zero for an event that was certain, 
increases monotonically as the expectation decreases, becomes very large as the 
expectation approaches zero. It increases roughly additively for a sequence of 
events to some fairly definite total. Its lack of an upper limit is indicated by 
such degree words as astounding. In fact, people have been surprised to death. 

It is proposed that the measure of surprise be called sarprisal. 

Probability and surprisal are the two basic functions on which the theory of 
information is founded. They represent the future and the present quantitative 
aspects of an event. 


System Measures 


\ 7 NOW WISH to derive an over-all system measure as representing the case 
of maximum detail. It must involve all the states of the system, and be of 
the form 


Fats), of6), SOG), «i 
But this may be re-expressed 


f =f [p(*,/x) n(x), p(*,/x) n(x), p(%/x) n(x), . 2 J 
= [n(x), P(x,/*), P(%2/x), P(%3/*), .- -] 


But n(x) is not a system characteristic, while p(x,/x) is. Hence n(x) must 
vanish from the function. Therefore the system measure must have the form 


H(X) = $ [P(™/*), P(%/*), P(%/)s - - -]. 


Two more properties are required on perfectly general grounds: (1) H(X) 
must be symmetrical in the state probabilities because the relation of states to 
the system is symmetrical. It follows that H(X) has a stationary value when 
the state probabilities are equal. (2) A zero state probability among » states 
must yield the same function as the system of » — 1 states with the zero proba- 
bility state eliminated, because, in objective observations zero probability is 
observationally equivalent to non-existence. Expression of the resulting form of 
H(X) may be possible but is not necessary here, because these conditions are 
covered by further conditions given later. 

Further conditions arise from natural mental convenience. 


A natural system measure will arise if there is a natural typical question and 


hands, and notes surprises in relation to these. Just as anxiety, a human valuation phe 


nomenon, is related to uncertainty, so may the present view of anxieties, such as relief, 
shock and pleasure, be related to surprise and perhaps be semantically difficult to dis 
entangle. Even if human values theories are under consideration, however, the writer does 
not believe a surprise index of unity can be satisfactory for a certain event, as Weaver's 
would be. No amount of shock value yields surprise in such a case, and only a zero is 
acceptable. 
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mental procedure, A natural question involving the system as a whole in detail, 
is, what is to be expected in a single future experiment. Since we are interested 
only in what is quantitative, the question becomes, what quantitative impression 
can we expect of a future experiment, when it is actually observed. Now when 
the experiment is actually observed, the quantitative impression will be surprise. 
A natural system measure, therefore, will be describable as the expectation of 
surprise for a complete future experiment. 

The resulting estimate, however, appears to the mind, not as a partly 
analyzed concept associated with the phrase “expectation of surprisal,”’ but as a 
whole concept associated with a single word. The whole concept is uncertainty. 
By analyzing the uncertainty concept we may derive the desired useful mathe- 
matical formula and show that it is the expectation of surprisal and fulfills the 
general conditions of a measure function. That is, we may show that 

system uncertainty = expectation of surprisal. 


Analysis of Natural Uncertainty 


| seagate is a sort of tension in the mind resulting from the contempla- 
tion of various results in some set of future observations. 

Let the uncertainty of an objective system be U(X). It is naturally formu- 
lated by a survey of the states in which the same quantitative property is noted 
for each state. Any such property of a state must, as we have shown, be a 
function of the counts in the classes concerned. Only the state itself x, and the 
system class x can be concerned, so the function may be denoted ¢(N, »,), where 
N is the system count and , the state count. Now 


$(N, 2) = $(N, piN) = WN, Pi) 


where p, is the state probability. Now p, is a true behavior characteristic of the 
state, and N is not; so N must vanish from y. Hence, 


W(N, Pi) = (Pi) 


Now the state properties are compiled to form U(X) by a series of increment 
operations, one for each state, in which the increment for each state is unaffected 
by its order in the series, or by any other state. The increments then are state 
properties, and must be the ones noted, the /(p,). Hence, 


U(X) = = h(p:) 


is the first property of system uncertainty of major import. 

A second major property may be discerned by considering two independent 
objective systems X and Y. The systems are independent in the natural observa- 
tional sense that observations of the combined systems have shown that one is 
not influenced by the other. Now if an observer has observed the systems sepa- 
rately and knows they are independent, he will naturally estimate the uncertainty 
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of the pair of systems simply by regarding U(X) and U(Y) as two increments 
unaffected by each other or by order, obtaining U(X) + U(Y). But if he 
had not observed the systems separately and had not noted their independence, 
he would estimate the uncertainty of the pair of systems as that of the system 
XY, the set of states consisting of all possible combinations x, + y,; of states 
of X with states of Y. Since the results of the two views must be consistent 
we must have, for independent systems 


U(XY) = U(X) + U(Y). 
The two major properties are sufficient to define U(X) to within a constant. 
Let ¢; be the probability of the jth state of set Y. Then the expectations in 


X being independent of those in Y means that the probability of state x, + y, 
in XY is p,+g;, so our first condition yields 


U(X) = > A(p;) 
U(Y) = 4(q;) 
U(XY) = & A(pi9;) 


and our second condition becomes 


= A(Pi9s) = 2 A( Pi) + 2445). 
lj J 


i 


We may find / by considering a special case. Suppose there are m states of 
probability 1/m in X and m states of probability 1/m in Y. Then by our last 
equation 


mn h(1/mn) = mh(1/m) + nh(1/n). 
Define f by 
f(x) = xh(1/x). 
Then we may write 
f(mn) = f(m) + f(r). 


But m and nm are independent. Therefore, differentiating with respect to m 


and n yields 

mf'(mn) = f'(n); nf'(mn) = f'(m), 
so that 

mf'(m) = nf'(n). 


Again, since m and n are independent, each side of the last equation must equal 
the same constant, say &. Then 
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f'(m) = k/m or f(m) = klogm+C. 
Furthermore C = 0 by the undifferentiated equation. Hence 
f(m) = klogm 
and 


mh(1/m) = k log m 
h(p,;) = —& p, log pj. 


The constant & determines the sign and the scale factor of the measure. 
Since uncertainty is a positive notion, and the magnitude of & is equivalent to 
a choice of logarithm base, we choose & = 1, so that 


h(p,) = — Pp, log p, 
and 
U(X) = — xp; log py. 


But this is precisely Shannon's H (uncertainty, entropy, information, etc.). 
We may now observe that U(X) (or the H function of information theory) 
is indeed the expectation of surprise, for h(p,) is the expected contribution 
per experiment, of the surprisal of state 7, and a similar contribution may be 
expected from every state. U(X) becomes the total expectation of surprise for 
the whole system. This is simply the surprisal averaged over all experiments in 
the collective. Uncertainty is then the average surprisal per experiment. The 
function also fulfills the general requirements for a system measure. 


Are There Other Quantities of Natural Interest? 


We may now inquire whether there are other basic quantities of natural 
interest, such that they may be regarded as natural intuitive concepts. Are there 
any more objective counterparts related as influence is to surprise? What about 
intermediate degrees of detail? 

The observer lives in the present, and receives active event impressions, as 
surprise, color or weight, only in the present. Only present events give rise to 
a sense of material objectivity. Only for concepts about the present will there 
be any tendency to seek, or be interested in, an objective counterpart of sub- 
jective impressions. Thus for surprise we found influence, because of its present 
character, and naturally sought an objective counterpart. For uncertainty and 
expectation, which are concepts about the future, the search would be mean- 
ingless. This is why entropy, which is related to uncertainty and is introduced 
in physics mathematically, is difhcult to those who try to get an objective idea 
of it. 


There is no detailed present interest in the collective, for only events happen- 
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ing create present impression, and these merely change the scope of the future. 
Only the change, which leads to surprise, concerns the present. The detail of 
interest concerns only the future viewpoint and will be regarded as an uncertainty 


of a subsystem. A possible event may be viewed as future, leading to expecta- 


tion. Thus, as future, we have concepts for both detail and mere change of 
scope in our interest. 

The past is not of direct quantitative interest in itself, and serves only to 
provide data for present and future quantitative impressions, with which the 
mind is actively concerned. One may examine the past after the manner of an 
object for mathematical analysis, and impose mathematical concepts without 
natural counterparts. One would then doubtless find the same basic function but 
give them unnatural names! 

It would appear from these remarks that expectation, surprise, and un- 
certainty exhaust the basic natural quantitative concepts. We naturally examine 
systems in various ways, combining or subdividing them, considering larger and 
smaller related systems, and make analyses of relations, but it is done naturally 
in terms of the concepts already described. It is reasonable to expect, and appears 
in the work of Shannon, that the mathematical counterparts will be corre- 
spondingly useful. 

The pursuit of quantitative concepts concerning the details of future behavior 
of a system has led the author to a combined theory of questions and un- 
certainties (in press as an Air Force Report) substantially extending the 
mathematical definition of uncertainty to correspond to the natural concept of 
uncertainty that goes with any objective question. Shannon's conditional entropy 
fits into the theory and is generalized. It is shown that when the actual event 
situation described above is viewed as a future possibility, the uncertainty measure 
is equal to the surprisal. 





Every process, event, happening—call it what you will; in a 
word, everything that is going on in Nature means an increase of 
the entropy of the part of the world where it is going on. Thus a 
living organism continually increases its entropy—or, as you may say, 
produces positive entropy—and thus tends to approach the dangerous 
state of maximum entropy, which is death. It can only keep aloof 
from it, i.e., alive) by continually drawing from its environment 
negative entropy . . . What an organism feeds upon is negative en- 
tropy. Or, to put it less paradoxically, the essential thing in metabol- 
ism is that the organism succeeds in freeing itself from all the entropy 
it cannot help producing while alive. 


E. SCHROEDINGER, What Is Life? 





COMMUNICATION THEORY AND 
METHODS OF FIXING BELIEF 


JOHN R. KIRK * 


— AND BRAWN-WISE, man is so delicate a creature that in floods, fires, 
famines, hurricanes, earthquakes, and from the rigors of heat and cold, he 
would long ago have perished were it not for the artificial environments he con- 
stantly creates and continuously controls. What controls man? To what can be 
attributed his success in battling those natural forces which drove the dinosaur 
and myriad other species to early extinction and which today consign even the 
great apes -despite their apposable thumbs—to confining geographic areas and 
the ignominy of man-made zoos? Citation of man’s great brain is common answer 
to both questions; it is a true answer, but a partial one. 

Once the neurologist goes behind the scenes of the celebrated human cortex 
and compares the scaffolding there with the ganglion of, say, the moth, he notes 
decided differences. The chief difference, of course, is in absolute size. Of all 
the animals, the species homo sapiens has the largest nerve-network. Also im- 
portant is the relative size of man’s brain with respect to his body; the walnut- 
sized brain of the brontosaur made the latter a quite stupid animal, but in a cat 
the same brain might well bestow a wisdom on the feline in excess of that 


granted by legend. Finally, there is complexity of neural connection. In concert, 


these three factors fit man superbly for handling and storing superlative amounts 
of information. 


The amount of information associated with a stimulus has to do with the 
total number of stimuli which can be responded to. Thus, in the simplest case 
of an automatic device that can respond only to two stimuli, ‘‘on” and “off,” 
which occur independently and with equal frequency, the amount of informa- 
tion in each stimulus is just one “bit.” Generally, the more complex the response 
repertoire of an organism or a machine, the more information is carried by the 
signals that stimulate it. 

The capacity,of a channel of communication, be it a wire, the air, or a 
certain band of radio frequencies, is strictly measured by the maximum amount 
of information that can be pumped through it in a unit of time; there is always 
such a maximum. Up to a point, the telegrapher can increase the amount of 
information conveyed per unit time simply by shortening the dots and dashes 

* Dr. Kirk has taught symbolic logic, semantics, and the philosophy of science at 
The University of Texas and is currently on the staff of KTBC-TV in Austin. 
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and spaces in between. But, even with the use of automatic sending and recording 
equipment, there is a limit to this process. The dots and dashes cease to be 
discriminable, the capacity of the channel has been overloaded, and general 
confusion results. Given a sophisticated audience, a lecturer can convey vast 
amounts of information in a short time by the use of abstract and technical 
terms. The same lecture delivered to an audience of smaller semantic capacity 
may completely fail of its purpose. The audience may not be information-filled 
to the limit of its capacity, the excess information spilling harmlessly like water 


from an over-filled glass; rather, the audience will be confused and carry home 
less information than it would have if the lecturer had pitched his talk within 


hailing distance of the vernacular. The tailoring of a message with respect to 


the capacity of a channel so that the latter will be effectively exploited without 
overcrowding is termed “‘coding.”’ 


A bugaboo of communication is nose, a ghostly hand that claws at a message 


somewhere between source and destination, changing it into obvious nonsense 


or, what is worse, converting it into another meaningful but entirely misleading 


message. Channels are never without some noise—acoustic noise in an audi- 


torium, electrical noise in a wire, electromagnetic noise (static) in the ether, 


semantic noise in the neural network of a human being trying to understand. 


And the only effective weapon against noise is some form of redundance. For 
safety’s sake, tower Operators at our airports, guiding pilots to safe landing, 
invoke a simple form of redundance—they repeat the same message over and 
over. The newsman, deciphering a garbled communiqué in the teletype, profits 
from the subtler redundance woven into the message by the rules of language. 
Redundance can go to waste, but clever coding will put redundance to work 
battling the noise. That redundance always lowers the amount of information 
conveyed by a set of stimuli is an inconvenience we are often happy to tolerate 
in order to escape the masking effects of noise 

Oddly enough, noise will add information to a message, but it will be 
spurious information. This brand of information can be recognized and cor 
rected for only if it crops up as an obviously nonsensical message; a message 
lacking redundance, should it be altered by noise, would allow the resultant 
spuriousness to go undetected. The particular character of a message which dis 
tinguishes it from nonsense and from other messages is termed “content of in- 


formation.” One practical problem in all communication is the preservation of 


the same content of information from source to destination 


yer ATION THEORY applies powerfully to all modes of communication 


whether by television, ordinary speech, or Amerindian smoke-signal, and 


whether it be between organisms (human beings) 


, between machines, or between 
organism and machine. Normally, the semanticist and linguistician would narrow 
inquiry to speech and, depending upon their purposes of analysis, choose as 
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elementary stimuli the letters or words in written speech or, in oral speech, the 
syllables, phonemes, or even the sine-waves of Fourier analysis. What choice is 
made will have notable effects on the numerical value of the amount of informa- 
tion carried by a unit. But suppose we refuse such choice and consider the 
natural stimuli, those composing messages from the outside world and dis- 
criminated by our sense-receptors. Such messages are highly redundant, and it 
is this redundancy which corresponds with our psychological conviction that there 
exists an external world. ‘Pink elephant’’ messages will ordinarily lack the re- 
iteration of those we ascribe to real objects; if and when they do not, we will 
ascribe reality to pink elephants. 

Despite the redundancy the world thrusts on us, human beings are extraor- 
dinary capacitors of information. Man's nostrils make fewer discriminations 
than a dog's, and there are forms of life relatively more sensitive as regards 
touch or hearing or sight; yet, it can be safely affirmed that man extracts from 
his environment a total amount of information greater than that extracted by 
any other organism.' He does this via /earning. Learning, like noise, increases 
the amount of information; but whereas noise introduces only spurious informa- 
tion, learning may do otherwise—indeed, it may weed out of storage spurious 
information placed there by noise or by prior learning. Learning will accomplish 
these happy results 7f there is something to be learned and the learning process 
is not halted. As a channel of communication man has staggering capacity. For 
this reason, man can carve out his environment with a fine chisel to make his 
world meet his wishes. Man can. Whether he does or will is a different and sober- 


ing question. Learning is often stymied, comes too late, or is of indifferent 
quality. 


— CONSCIOUSLY-ENTERTAINED product of learning is belief. Belief, be 
haviorally speaking, is a habit-pattern under the control of the cortex. It 
does not, as Peirce points out, incite us to immediate action but “puts us into 
such a condition that we shall behave in some certain way, when the occasion 


arises. 2 But whether the resultant behavior is appropriate with respect to our 


* Either a great deal that is going on in the human brain and nervous system is corre- 
lated with events in the environment or a preponderance of these neural events is spon- 
taneous (intrinsically undetermined) or reverberative. But in the latter case the human 
being in question would quickly be diagnosed as mad. Most, therefore, of the amount of 
information coded in normal cortical events must be considered as transferred from the 
environment. And since more goes on in the human brain than in any other, we conclude 


that the human being extracts a greater amount of information from his environment than 
does any other organism. 


?In an article which has become classic (Cf. Popular Science Monthly, Vol. 12, No- 
vember, 1877), Charles S. Peirce investigated and evaluated four avenues by which we 
acquire beliefs. That his conclusions are informally derivable from information theory gives 
additional support for each. See Charles Hartshorne and Paul Weiss (ed), The Collected 
Works of Charles Peirce, Vol. 5, Chapter IV. 
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goals depends sensitively on whether our belief contains genuine or spurious 
information. The methods we choose for fixing our belief—the communication 
channels we tap for filling our reservoir of beliefs—will critically influence the 
probability-weight, the truth-value, of the information thus acquired. 

There are pre-conscious, even pre-learning, analogs of belief. Watch the 
activity surrounding a bright street lamp some summer night. In complex 
trafhic-patterns come the moths. One by one, their wings singed, they fall. If 
they survive, again to engage in flight, their previous danger in no way inhibits 
their fatal phototropism. In short, their learning-behavior is negligible. On each 
side of their tiny bodies is a receptor, similar in operation to the photo-electric 
cell, which dispatches neural messages of control to the wing on its side; these 
messages are in such a code that the vigor of the wing's beat is a function of the 
receptor's illumination. So long as the receptors are equally lighted, the moth 
flies ‘straight and level;’’ if otherwise, the moth makes a spiral approach to the 
light-source. The moth, too, has a habit-pattern, but one which is rigid, 
inflexible.# 

Our pervasive notion that human beings are “‘superior’’ and that moths are, 
in truth, “lowly,” is not entirely a vestige of religion nor a token of our usual 
anthropocentrism. Human communications equipment, properly used, could 
maximize far beyond the capacities of other life-forms the conditions for sus- 
tained and even pleasant survival. Faced with sudden and capricious changes in 
his environment, man has methods at his disposal, if he will but use them, to 
combat such threats effectively. Many of the lower animals, by contrast, have 
been irrevocably defeated in the matter of species-survival by even such orderly 
and creeping vicissitudes as the advance of glaciers or evaporation of lakes. 
Moths display a good survival record because they are so prodigally proliferous 
that somewhere on earth's varied surface they are almost sure to find a convivial 
foothold. Other species seek refuge in the static environment of the deep sea. 

* All stimuli are cybernetic agents (agents of control or communication)—signals, 
signs, and symbols. Any stimulus is a signal; a sign is a signal the response to which has 
been modified by learning; a symbol is a sign producible by its interpreter and substitutable 
for a synonymous sign. Signals include signs, and signs include symbols. Communication 
theory applies to all cybernetic agents whereas semiotic, the theory of signs, applies solely 


to semiotic agents—signs and symbols. Light, for the moth, is a mere signal vith respect 


to which it indulges in what Korzybski has termed a signal-reaction, and students of general 
semantics have long been aware that this is often detrimental to survival. But signal- 
reactions are not always valueless—whether for moths or for human beings! Lepidoptera, 
as an order of insects, was abroad on earth long before man, and, relative to present data, 
it is not unlikely that following an atomic holocaust of sufficient severity the moths, not 
man, will inherit that earth. Neural development necessary for sign-responses and symbol- 
responses, involving as it does an absolute number of neurons (around ten billion) far 
greater than that available to “lowly” insects, constitutes expensive evolutionary investment; 
in the present epoch that investment may prove to have been excessive. Indeed, the dis- 


tinctively human mastery of symbols threatens the survival of all life-forms. 
‘ 
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But the mechanism of cultural inheritance (Dobzhansky) or time-binding 
(Korzybski) bestows on a human being—or, rather, on a human species—a 
greater range of strategies. 

For the moth, very nearly all behavior-responses are fixed, made rigid, by 
biological inheritance. In man, a few hereditary traits—like eye-color—are sim- 
ilarly fixed, others—like skin-color—are relatively plastic, and plasticity is 
maximized in that neural activity we identify with thought, the fixation of belief. 
There is overwhelming evidence that in man’s arsenal of strategies are those 
which would deal effectively with threats to survival which would overpower any 
other animal, and this can be summarized in the statement: Man has maximum 
survivability (i.e., survive-ability). To infer, however, that man will probably out- 
survive contemporary competitors among other species would be a dangerous 
non sequitur. Though plasticity-of-response to environmental change is a prac- 
tically necessary condition of survival in a world choked with contingencies, 
sheer plasticity will neither necessarily nor even practically insure that survival. 
There is no necessary correlation between survivability and probability of sur- 
vival. Whether the survival-strategies available to man will be exploited or are 
destined, rather, to remain wistful possibilities, depends crucially on the efficiency 
of our learning, the methods by which we fix our beliefs, the manner in which 
we deal with the hazard of spurious information. 


A S$ HUMAN INDIVIDUALS either we learn, or we are carefully coddled by other 
individuals who can learn, or we quickly die. The rate at which we learn 
governs the amount of information we absorb. The method of learning governs 


the signal-to-noise ratio and thus determines what proportion of the information 
will be genuine and what spurious. Fantastically, a large segment of society has 
adopted a method of learning, a method of fixing belief, which reduces it, as 
a social organism, to the level of a helpless imbecile or even the moth, This is 
the method of tenacity. It consists in clinging steadfastly to the status guo in 
beliefs. Its motto: “What was good enough for father is good enough for me!” 
Its battle cry: ‘Return to the eternal verities, the basic truths, the fundamental 
principles, the faith of our forefathers!” Its justification: “These are the beliefs 
which befit strength of character, integrity, peace of soul!”’ Its educational 
philosophy: ‘Teach only that which will not disturb. Publish for the sake of 
prestige, but, if your remarks are based on research, do so in a journal sufficiently 
obscure that no one will discover what you have found out. The aim of educa- 
tional institutions is to teach students what their parents would teach had the 
latter the time and pedagogic skill.’ Its psychology: stolidity, armoring; quiet 
conservatism in normal times; in times of crisis, hysteria and sadomasochism. 

By this method human individuals learn only what is told them by indi- 
viduals or institutions. (A total schizophrenic learns only what he tells himself). 
This source of learning has been termed “cognitive transfer,” but both genuine 
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and spurious information are subject to such transfer. On the assumption that a 
large number of the beliefs circulating within the “social organism” are true 
or substantially so, a society which invokes this method may for a time fare well. 
Myths and superstitions can sometimes be held on to with little danger, while 
their rejection may mean loss of employment, ostracism, or burning at the stake. 
Nor will belief in heaven be disappointed (though this belief may induce quite 
irrational behavior prior to death). But in prolonged use, the method may 
easily lead to disaster. Old beliefs, even in the absence of noise, may lack 
cogency with respect to new situations (i.e., environmental changes may make 
even old truths irrelevant to new needs), and any new beliefs will be due to 
noise, hence false (spurious information). Too, the old truths are not immune 
to the ravages of noise; it is known well that old truths get contaminated via 
word-of-mouth transmissions and in the semantic change which, with due age, 
all written symbols undergo. Slowly, one by one, the old symbol-strings become 
either false or meaningless or (in extremely rare cases) accidentally true in 
some new interpretation. This is roughly the situation in the moth. The photo- 
receptor “tells” the wing that there is light. The “belief” the wing thus acquires 
is true. But this “true belief’ is certainly irrelevant to the new danger which 
arises when the moth approaches a light-source hot enough to roast it. And 
the heat itself may so alter the channel of communication between receptor and 
wing that future messages become false or lapse into nonsense. The method of 
tenacity suffers from built-in ob/iguity: “It would be so nice if what has worked 
in the past will work in the future, therefore it probably will.” In effect, this 
method reduces the value of our individual symbol-responses to that of a social 
signal-reaction ; it by-passes whatever value our individual and distinctive learn- 
ing capacity might otherwise have. 

Finally, and fatally, this method affords no justification for the assumption 
that any beliefs will be true. In this lies the poignancy of the story Anatol Rapo- 
port tells (in Science and the Goals of Man, p. 25) of a Moslem scholar pains- 
takingly punctilious in his memorizing of the Koran. Guarding against noise 
is of no value if what is preserved is spurious information, and whether or not 
this is the case cannot be ascertained by the method of tenacity. Genuine in- 
formation is vot the sort of thing that stores well. Channels of communication, 
gorged with genuine information, are like leaky buckets. The only way to keep 


them full of information is to assure a proper intake, plugging the leaks serves, 


} 
I 


in this case, to hasten the decay of information from genuine to spurious. 


**Plugging the leaks’’ is an apt description of billion-dollar secrecy measures taken in 


the name of ‘‘security’’ with respect to our basi search in nuclear physics (the narrow 


ing of communication channels is a partial plugging of a leak). Similar considerations 


\ 


apply to loyalty oaths, etc. The sense « rity that such measures yield is likely to be 
, 


based on spurious information and thus out of all proport to the real security, if any, 
thus implemented; and to the degree that policy-making is dictated by sense rather than 


DY 


reality. this real security is in fact dimuinis! 
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— METHOD OF AUTHORITY relies on the pronouncements of State and 
Church, of parents and teachers, of priests and kings. Beliefs are fixed by 
decree and backed by power—by threats of violence or the promise of reward. 
It is a method of which we as individuals must all make use on first entering 
the world. Without it, it is doubtful whether we would ever learn language; 
the language we first understand is imperative in character and backed by 
power—be it tender or otherwise; words are our masters before they become our 
slaves. Ignorance, too, is a dark power hovering over us, and against the alterna- 
tive of remaining ignorant we do well to consult authority—if by “authority” 
we mean “‘a source of knowledge.” 

Neither the existence nor the usefulness of authorities is denied here, but 
two embarrassing questions require answer: (1) By what method of fixing belief 
do we distinguish real authorities from pseudo-authorities? If we reply: ‘By 
authority,’ we fall into a vicious circle. (2) What method of fixing belief does 
the authority use? If he answers: “By authority,” he, too, falls eventually into 
a vicious circle. And if the authority uses the method of tenacity and we invoke 
authority, we are, as a group, employing the method of tenacity. 

The method of reason, too, reduces to tenacity. “Reason,” as used by Peirce 
and most other philosophers, denotes deductive inference; this kind of inference 
cannot of itself produce genuinely new knowledge. The premises on which 
logico-mathematical manipulations operate to yield factual conclusions must be 
established by a method other than the method of reason. 


I“ THE METHOD OF EXPERIMENT, or scientific method, we discover controls 
of our beliefs outside ourselves and our institutions. Assiduously applied, 
it enables us to escape obliquity. And it is a democratic method, available to all 
members of mankind of no matter what age, gender, race, or previous condition 
of intellectual servitude. While by tenacity we are the slaves of the past (and 
entropy), while by authority we are the slaves of those who wield the power, 
and while by pure reason we become unknowing slaves of our own prejudices 
and obliquities, the method of science affords freedom from all these influences. 

Scientizing can be either crude or sophisticated; it is by no means the 
monopoly of the professional. This is important. A relatively crude use of 
scientific method is sufficient to discriminate between pseudo-experts and genuine 
experts—the latter being those who use scientific method directly or who are 
linked to it via cognitive transfer; this method enables us to become expert in 
the detection of experts in fields of inquiry in which we are not expert. In 
rational policy-making with cognitive ends in view, we first detect the experts 
and place them in positions of authority; we then use the method of authority— 
gaining from these experts, by cognitive transfer, the beliefs they fixed by sci- 
entific method. 

Are such beliefs true? Does scientific method enable us to weed out spurious 
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information, to counteract the effects of noise? The answer is not as congenial 
as we might like it to be. According to scientific evidence, it does, but we 
cannot—without obvious circularity (the problem of induction)—rely on this 
evidence for appraising the scientific method itself. 

What we would /7ke, of course, is a method which would guarantee a yield 
of true beliefs. Both communication theory and the theory of inductive infer- 
ence agree that no such method exists. Very well, we may pitch our requirements 
more modestly—rest content with a method which will probably yield true 
beliefs. But this involves a guarantee of a probable yield of true beliefs. Again, 
there is no such method. At no point in this infinite regress is there either a 
guarantee or a probability of a probable yield of true beliefs. 

And yet, as the late Hans Reichenbach was able to show, we do ‘have a 
guarantee that if our goal be that of arriving at true beliefs, then it is a good 
idea to use scientific method. Reichenbach has proved what Peirce apparently 
believed without proof, namely: /f there is some truth or other to be discovered, 
continued application of scientific method must at some time discover it.5 It can 
not be proved that there are not better methods. Nor can we prove anything 
about another method without implicit use of scientific method as a tester of 
the candidate method. It is legitimate, however, to use scientific method as a 
tester of other methods; and, thus far, these other methods have been found 
wanting. For a growing number of philosophers this proof is entirely satisfying. 
And it is a fascinating story in itself. 


® Hans Reichenbach, The Theory of Probability, Ch. 11 





Comparison of living things with machines may seem at first to 
be a crude, even rather childish procedure, and it certainly has 
limiations; but it has proved to be extraordinarily useful. Machines 
are the products of our brains and hands. We therefore understand 
them thoroughly and can speak conveniently about other things by 
comparing them with machines. The conception of living bodies as 
machines, having, as we say, ‘‘structures’’ and “‘functions,’ is at the 


basis of the whole modern development of biology and medicine. 


J. Z. YOUNG, Doubt and Certainty in Science. 





+ BOOK REVIEWS + 


Manhood of Biology 


DouBT AND CERTAINTY IN SCIENCE: A BIOLOGIST’Ss REFLECTIONS ON THE 
BRAIN, by J. Z. Young. Oxford: The Clarendon Press. 168 pp. $2.50. 


HIS LITTLE VOLUME is a compilation of the 1950 Reith Lectures given by 

Professor J. Z. Young over the B.B.C. Each Lecture is followed by a 
“Comment,” which amplifies the necessarily tightly woven material of the 
lecture. The result is an outstanding example of the sort of communication 
which could bring the most profound achievements of scientific thought and 
practice within the range of everyone's understanding. It is clear from the 
remarks in the author's preface that an immense amount of work went into 
making the lectures ‘‘understandable.” In acknowledging the assistance rendered 
him in the preparation of the lectures, Professor Young congratulates the British 
public on “having a Broadcasting Service that makes available such intelligent, 
competent, and patient work to assist its speakers.’’ I think the B.B.C. is like- 
wise to be congratulated on having a scientist and teacher of Professor Young's 
caliber available for their excellent work in public education. 

The excellence of Young's presentation seems to stem from his intense love 
for his subject (the structure, evolution and function of the brain) and his 
profound respect for his audience (any one with an inquisitive mind). These 
attitudes are reflected in that curious mixture of passion and reserve which is 
the mark of the genuine scientist. 

Young's thesis is quite familiar to students of general semantics. Not only is 
symbolic behavior uniquely characteristic of man; it is also man’s most important 
survival mechanism. Therefore not only does psycho-linguistics form a major 
portion of the science of man, but also it serves as a bridge between science and 
ethics. 

The difficulty of constructing a “‘scientific ethics” has been the following. 
If one wishes to make a statement about “The Good” of the broadest validity, 
the only thing one can say with assurance is that survival is good. But this is a 
platitude and does not in itself enable one to make choices among the various 
ways to insure survival. On the other hand, if one includes specific rules of 
conduct in one’s descriptions of “The Good,” one cannot avoid the recognition 
of culture-bound values, of which “‘scientific evaluation’ seems to be only 
another instance. 
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The recognition of the specifically human survival mechanism, that is, sym- 
bolic communication across space and time, resolves this difficulty. Young's great 
achievement is in the lucid development of his theme, leading inevitably to the 
ethical conclusion, which N. Wiener had stated so eloquently: “To live well 
means to live with adequate information. 


b gener begins with the biologist’s view of man as an animal with a remarkably 
developed organ of communication, namely the brain. In describing how 
the ability to communicate insures the survival of the human being, individually 
and collectively, Young is led to the fundamental question on which the science 
of man is based: how does man’s “organ of communication” work? This ques- 
tion leads him naturally into the “philosophy of biology,” that is, the critical 
appraisal of the biological models as they developed through the ages: the primi- 
tive animistic view of biology (which also dominated the primitive view. of 
the physical world), the advent of the mechanistic view (where organisms ap- 
peared as clockworks), the generalization of the mechanistic view prevalent in 
nineteenth century physiology (where organisms were viewed as heat engines), 
and finally to the impact of communication engineering and information theory 
on the now emerging view of life as an ordering process. This impact may well 
be hailed as ushering in the manhood of biology. 

Having stated his orientation, the author proceeds to describe in a superbly 
lucid exposition the highlights of what is known today in neural anatomy and 
how this knowledge indicates the directions of research into the behavior of 
higher organisms, namely the phenomena of conditioning and learning, of 
memory and purposeful actions. 

The most attractive feature of Young's book is the smooth transition from 
neural anatomy and physiological psychology to questions of semantics, psycho- 
linguistics, the philosophy of science, and ethics. This transition, which is also 
the climax of the exposition, appears in the Fifth and Sixth Lectures. The theme 
is the evolution of human society around the symbols of power and cooperation. 
Especially fascinating is Young’s emphasis on the connection between the power 
to abstract and the potentiality of the unification of mankind. Here as nowhere 
else is the inevitability of scientifically founded trans-cultural ethics presented 
most convincingly. Here also the significance of the title, Doubt and Certainty in 
Science becomes apparent. It becomes clear that philosophy rather than neural 
anatomy is the central topic of the book, and that the preceding chapters, whose 
main theme is neural anatomy, serve to describe the human brain as the “organ 
of doubt and certainty.” 


i IS REMARKABLE that whereas the biological ideas of the nineteenth century 
gave rise to harsh views on the meaning and the future of human life 
(Malthus, Spencer, Nietzsche), it is from biological ideas that the most 
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optimistic and charitable views of man are emerging in our century (H. G. 
Wells, J. Huxley, Korzybski, Wiener). There is a paradox here. It is the nine- 
teenth century which is usually portrayed by the historians of western civilization 
as the optimistic one and the twentieth as the confused and despairing one. 
Yet when one probes into the prevalent ideas stemming from biology, rather 
the reverse seems to be true. A somewhat closer examination of those ideas 
resolves the paradox. In the nineteenth century it was the man of action, the 
entrepreneur, the empire builder, rather than the thinker, who was the optimist. 
The man of action thrived on the spirit of the age, namely competition, while 
the thinker, in trying to relate this spirit to a ‘law of nature,” was appalled at 
the implications. Today confusion reigns among those who still think in nine- 
teenth century terms of individualistic competition. The biologists, however, 
who are discovering much more profound mechanisms of survival than those 
known in the nineteenth century, have reason for optimism, guarded in the 
writings of Wiener, confident in those of Huxley and Korzybski. Young defi- 
nitely belongs to the latter group of benevolent prophets. 


ANATOL RAPOPORT 


The Multi-ordinal View of Language 


LANGUAGE AND COMMUNICATION, by G. A. Miller. New York: McGraw Hill, 
1951. 298 pp. $5. 


H™ IS A BOOK on language which can serve both as a textbook and an 
introduction to the subject for the thoughtful layman. To what subject? 
Not to linguistics in the traditional sense, which confines itself largely to the 
study of phonetic and grammatical structure, and certainly not to the venerable 
curriculum subject, which is called “English” in English speaking countries, 
“German” in German speaking countries, etc., and not to what was called 
“rhetoric” in medieval universities. The subject treated here is a branch of 
psychology, which in general semantics literature is often referred to as psycho- 
linguistics. In short it is a treatment of language as a form of human behavior. 
In this way, psycho-linguistics is, perhaps, coordinate with such other branches 
of psychology as ‘‘vision” or “sexual behavior.” I say a branch of psychology 
and not physiology, because the behavior of organisms as a whole and in rela- 
tion to other organisms is included. However, the study of the subject begins 
with a description of physiological events, and the attempt is made to link 
these events to the resulting over-all behavior patterns. In this attempt lies the 
justification of the author's statement that his point of view is psychological and 
scientific (that is, behavioristic). 
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N° CLAIM can be made for psycho-linguistics to rest on a scientific basis 
unless this basis consists of what is known about the actual events con- 
comitant to linguistic behavior. The difficulty is in deciding which are the most 
important events in order to single them out for study. In fact, this selection is 
the central problem in psycho-linguistics and in general semantics. Dr. Miller 
does not attempt to make such a selection. In this way, his book offers no 
“theory” of psycho-linguistics, as has been attempted by Korzybski in Science 
and Sanity and by Zipf in The Principle of Least Effort. In fact, we have a 
feeling in reading Miller's modestly conceived volume that perhaps the system- 
building efforts of Korzybski and Zipf have been somewhat premature. 

This is not to say that premature efforts are not often laudable and inspiring. 
However in the development of any science there comes a time when one has to 
gather pertinent facts in a sober way (not with the express purpose of fitting 
them into a scheme) and try to make sense of them in terms of designing 
feasible experimental procedures. Miller's book is essentially a survey of the 
facts which seem pertinent and of experiments which seem relevant. 

It turns out that psycho-linguistics, as most of the ‘‘new’’ sciences now 
cropping up, is a ‘‘bastard’”’ science. It has to draw on physics, anatomy, physi- 
ology, psychology, linguistics, semantics, and sociology. It should be pointed 
out in passing that one of the laudable and inspiring insights in Korzybski’s 
“premature” attempt to construct a science of psycho-linguistics was the recogni- 
tion that such a science must be interdisciplinary. Miller has obviously recog- 
nized this multiple orientation of his subject and has organized his book 
accordingly. 

A mere listing of the chapter headings is indicative of this recognition. The 
chapter entitled “The Phonetic Approach’’ gives a minimum of background in 
the anatomy, physiology, and physics of articulate sound formation. “The 
Perception of Speech” does the same from the point of view of hearing and 
distinguishing articulate sounds. ‘The Statistical Approach” reviews the attempts 


to find large scale regularities in phonetic and verbal outputs, particularly the 
extensive work of G. K. Zipf. ‘The Rules for Using Symbols” and “Words, 
Sets, and Thoughts’ are chapters on semantics. ‘The Verbal Behavior of Chil 


dren’ is an introduction to linguistic ontogeny. ‘Verbal Habits” and ‘The Role 
of Learning’ deal with the psychology of verbal behavior proper. The last 
chapter, “The Social Approach,” gives a glimpse into the modern theories of 
communication including the significant work of Bavelas and others in group 
behavior as a function of its communication net and the striking discoveries 
about the role of cultural conditioning in distorting information (made in the 
study of rumors). 

The style is lucid and entertaining throughout. The whole approach to the 
subject is thoroughly modern. For example the recent advances in the mathe- 
matical theory of information as developed by Shannon and others are skillfully 
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put to use in several chapters—a difficult task where the reader's mathematical 
knowledge cannot be assumed to go beyond the understanding of logarithms. 
Language and Communication should be a valuable asset in the library of every 
serious student of semantics. 


ANATOL RAPOPORT 





As yet the machines receive their impressions through the agency 


of man’s senses: one traveling machine calls to another in a shrill 
accent of alarm and the other instantly retires; but it is through the 
ears of the driver that the voice of one has acted upon the other. Had 
there been no driver, the callee would have been deaf to the caller. 
There was a time when it must have seemed highly improbable that 
machines should learn to make their wants known by sound, even 
through the ears of man; may we not conceive, then, that a day will 
come when those ears will be no longer needed, and the hearing will 
be done by the delicacy of the machine's own construction?—when 
its language shall have been developed from the cry of animals to a 
speech as intricate as our own? 


—SAMUEL BUTLER, Erewhon. 





% METALOGUE + 


DADDY, HOW MUCH DO YOU KNOW? 
GREGORY: BATESON * 


|; On Daddy, how much do you know? 

FATHER: Me? Hm—I have about a pound of knowledge. 

D: Don't be silly. Is it a pound sterling or a pound weight? I mean really 
how much do you know? 

F: Well, my brain weighs about two pounds and I suppose I use about a 
quarter of it—or use it at about a quarter efficiency. So let's say half a pound. 

D: But do you know more than Johnny's daddy? Do you know more than 
I do? 

F: Hm—I once knew a little boy in England who asked his father, ‘Do 
fathers always know more than sons?” and the father said, “Yes.” The next ques- 
tion was, ‘Daddy, who invented the steam engine?’ and the father said, “James 
Watt.’’ And then the son came back with ““——but why didn’t James Watt's father 
invent it?” 


* *« * 


D: I know. I know more than that boy because I know why James Watt's 
father didn’t. It was because somebody else had to think of something else 
before anybody could make a steam engine. I mean something like—TI don't 
know-—-but there was somebody else who had to discover oil before anybody 
could make an engine. 

F: Yes—that makes a difference. I mean, it means that knowledge is all sort 
of knitted together, or woven, like cloth, and each piece of knowledge is only 
meaningful or useful because of the other pieces—and . . . 

D: Do you think we ought to measure it by the yard? 

F: No. I don't. 

D: But that’s how we buy cloth. 

F: Yes. But I didn’t mean that it 7s cloth. Only it’s like it,—and certainly 
would not be flat like cloth—but in three dimensions—perhaps four dimensions, 


* This is the third of Mr. Bateson’s “metalogues’’—or metalinguistic dialogues. He is 


an anthropologist and co-author (with Jurgen Ruesch) of Communication: The Social 
Matrix of Psychiatry (New York: Norton, 1951). 
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D: What do you mean, daddy? 

F: I really don’t know my dear. I was just trying to think. 

F: I don’t think we are doing very well this morning. Suppose we start out 
on another tack. What we have to think about is how the pieces of knowledge 
are woven together. How they help each other. 

D: How do they? 

F: Well—it’s as if sometimes two facts get added together and all you have 
is just two facts. But sometimes instead of just adding they multiply—and you 
get four facts. 


D: You cannot multiply one by one and get four. You know you can't. 
F: Oh. 


* * * 


F: But yes I can, too. If the things to be multiplied are pieces of knowledge 
or fact or something like that. Because every one of them is a double something. 

D: I don’t understand. 

F: Well—at least a double something. 

D: Daddy! 

F: Yes—take the game of Twenty Questions. You think of something. Say 
you think of “tomorrow.” All right. Now I ask “Is it abstract?’ and you say 
“Yes.” Now from your ‘yes’ I have got a double bit of information. I know 
that it 7s abstract and I know that it isn’t concrete. Or say it this way—from your 
“yes” I can halve the number of possibilities of what the thing can be. And 
that’s a multiplying by one over two. 

D: Isn't it a division? 

F: Yes—it’s the same thing. I mean—all right—it’s a multiplication by .5. 
The important thing is that it’s not just a subtraction or an addition. 

D: How do you know it isn’t? 

F: How do I know it?—Well, suppose I ask another question which will 
halve the possibilities among the abstractions. And then another. That will have 
brought down the total possibilities to an eighth of what they were at the begin- 
ning. And two times two times two is eight. 

D: And two and two and two is only six. 

F: That's right. 

D: But daddy, I don’t see—what happens with Twenty Questions? 

F: The point is that if I pick my questions properly I can decide between 
two times two times two times two twenty times over things—2?° things. That's 
over a million things that you might have thought of. One question is enough to 
decide between two things; and two questions will decide between four things— 
and so on. 

D: I don’t like arithmetic, daddy. 

F: Yes, I know. The working it out is dull but some of the ideas in it are 
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amusing. Anyhow, you wanted to know how to measure knowledge and if you 
start measuring things that always leads to arithmetic. 

D: We haven't measured any knowledge yet. 

F: No. I know. But we have made a step or two towards knowing how we 
would measure it if we wanted to. And that means we are a little nearer to 
knowing what knowledge is. 

D: That would be a funny sort of knowledge, daddy. I mean knowing about 
knowledge—would we measure that sort of knowing the same way? 

F: Wait a minute—I don't know—that's really the $64 Question on this 
subject. Because—well, let's go back to the game of Twenty Questions. The 
point that we never mentioned is that those questions have to be in a certain 
order. First the wide general question and then the detailed question. And it’s 
only from answers to the wide questions that I know which detailed questions 
to ask. But we counted them all alike. I don’t know. But now you ask me if 
knowing about knowledge would be measured the same way as other knowledge. 
And the answer must surely be no. You see if the early questions in the game 
tell me what questions to ask later, then they must be partly questions about 
knowing. They're exploring the business of knowing. 

D: Daddy—has anybody ever measured how much anybody knew. 

F: Oh yes. Often. But I don’t quite know what the answers meant. They do 
it with examinations and tests and quizzes but it's like trying to find out how 
big a piece of paper is by throwing stones at it. 

D: How do you mean? 

F: I mean—if you throw stones at two pieces of paper from the same dis- 
tance and you find that you hit one piece more often than the other then probably 
the one that you hit most will be bigger than the other. In the same way, in an 
examination you throw a lot of questions at the students, and if you find that 
you hit more pieces of knowledge in one student than in the others then you 
think that student must know more. That's the idea. 

D: But could one measure a piece of paper that way? 

F: Surely one could. It might even be quite a good way of doing it. We do 
measure a lot of things that way. For example we judge how strong a cup of 
coffee is by looking to see how black it is—that is, we look to see how much 


light is stopped. We throw light waves at it instead of stones, it’s the samme idea. 


D: Oh. 


*k * * 

D: But then—why shouldn't we measure knowledge that way? 

F: How? By quizzes? No—God forbid. The trouble is that that sort of 
measuring leaves out your point—that there are different sorts of knowledge 
and that there's knowing about knowledge. And ought one to give higher marks 
to the student who can answer the widest question? Or perhaps there should be 
a different sort of marks for each different sort of question. 
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D: Well, all right. Let's do that and then add the marks together and 
then... 

F: No—we couldn’t add them together. We might multiply or divide one 
sort of marks by another sort but we couldn’t add them. 

D: Why not, daddy? 

F :—Because—because we couldn't. No wonder you don’t like arithmetic if 
they don’t tell you that sort of thing at school—What do they tell you? Golly— 
I wonder what the teachers think arithmetic is about. 

D: What 7s it about, daddy? 

F: No. Let's stick to the question of how to measure knowledge—Arithmetic 
is a set of tricks for thinking clearly and the only fun in it is just its clarity. 
And the first thing about being clear is not to mix up ideas which are really 
different from each other. The idea of two oranges is really different from the 
idea of two miles. Because if you add them together you only get fog in your 
head. 

D: But daddy, I can’t keep ideas separate. Ought I to do that? 

F: No— No— Of course not. Combine them. But don’t add them. That's all. 
I mean—if the ideas are numbers and you want to combine two different sorts, 
the thing to do is to multiply them by each other. Or divide them by each other. 
And then you'll get some new sort of idea, a new sort of quantity. If you have 
miles in your head, and you have hours in your head, and you divide the miles by 
the hours, you get “miles per hour'’—that’s a speed. 

D: Yes, daddy. What would I get if I multiplied them? 

F: Oh—er—I suppose you'd get mile-hours. Yes. I know what they are. 
I mean, what a mile-hour is. It's what you pay a taxi-driver. His meter 
measures miles and he has a clock which measures hours and the meter and the 
clock work together and multiply the hours by the miles and then it multiplies 
the mile-hours by something else which makes mile-hours into dollars. 

D: I did an experiment once. 

F: Yes? 

D: I wanted to find out if I could think two thoughts at the same time. So 
I thought “It’s summer’’ and I thought “It’s winter.” And then I tried to think 
the two thoughts together. 

F: Yes? 

D: But I found I wasn’t having two thoughts. I was only having one thought 
about having two thoughts. 

F: Sure, that’s just it. You can’t mix thoughts, you can only combine them. 
And in the end, that means you can’t count them. Because counting is really only 
adding things together. And you mostly can’t do that. 

D: Then really do we only have one big thought which has lots of branches— 
lots and lots and lots of branches? 

F: Yes. I think so. I don’t know. Anyhow I think that is a clearer way of 
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saying it. I mean it’s clearer than talking about bits of knowledge and trying to 
count them. 


D: Daddy, why don’t you use the other three-quarters of your brain? 


F: Oh, yes—that—you see the trouble is that I had school teachers too. And 
they filled up about a quarter of my brain with fog. And then I read newspapers 


and listened to what other people said, and that filled up another quarter with fog. 
D: And the other quarter, daddy? 
F: Oh—that’s fog that I made for myself when I was trying to think. 





Persecution for the expression of opinions seems to me perfectly 
logical. If you have no doubt of your premises or your power and 
want a certain result with all your heart you naturally express your 
wishes in law and sweep away all opposition. To allow opposition by 
speech seems to indicate that you think the speech impotent or that 
you do not care whole-heartedly for the result, or that you doubt 
either your power or your premises. But when men have realized that 
time has upset many fighting faiths, they may come to believe even 
more than they believe the very foundations of their own conduct 
that the ultimate good desired is better reached by free trade of ideas 
—that the best test of truth is the power of the thought to get itself 
accepted in the competition of the market, and that truth is the only 
ground upon which their wishes safely can be carried out. That at 
any rate is the theory of our Constitution. It is an experiment, as all 
life is an experiment. Every year if not every day we have to wager 
our salvation upon some prophecy based upon imperfect knowledge. 


OLIVER WENDELL HOLMES, Abrahams v. United States, 250 US. 
616. 1919. 
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